DéjàDéjài
Déjà System Documentation

Architecture & Governance Standards

Technical specifications for Ingestion Protocols, Forensic Linkage Logic, and Audit Data Models. This document is designed for engineering leadership, DevOps, and security review: high precision, auditable linkage, and zero inference in the critical path.

Claim 4: No AI inferenceDual-lane ingestionEntropy gateSHA‑256 compound fingerprintSilence > hallucination
Table of contents
Version 1.0 • Public

Déjà System Specification & Architecture Reference
Version: 1.0 (Public) • Status: Active • Classification: Technical Documentation

Structure
The Core Physics → Normalization → Integration Standards → Advanced Configuration → Security & Privacy → Validation Orchestrator → System Lifecycle → Troubleshooting
1. Core concepts
The mathematical principles

The mathematical principles behind the Deterministic Engine.

1.1 Determinism vs. probability
Claim 4 • No AI inference
The Thesis: Rejection of Inference

Modern incident resolution tools largely rely on Probabilistic Models (LLMs, Vector Embeddings, Cosine Similarity) to link errors to code. These models operate on a confidence interval: they guess the relationship between a stack trace and a pull request based on semantic similarity.

Déjà rejects this approach for production infrastructure. In the domain of root cause analysis, a "high confidence guess" (e.g., 90% probability) is functionally equivalent to noise. If an operator cannot trust the link implicitly, they must verify it manually. If manual verification is required, the automation has failed.

The Déjà Deterministic Engine is architected on a binary principle:
Match (1): The relationship is mathematically proven via strict equality, hash collision, or direct Git ancestry.
No Match (0): The relationship cannot be proven.
We do not offer "likely" matches. We offer "proven" matches or silence.
Claim 4: The "No AI" Constraint

This architectural philosophy is codified in US Patent Application 19/430,349, specifically Claim 4:

"Wherein matching the canonical path... comprises a deterministic match that does not require real-time inference by a machine learning model."
US Patent Application 19/430,349Claim 4
  • Zero Hallucination: The system cannot invent a link that does not exist.
  • Zero Data Leakage: Source code is never passed to a third-party generative model for processing.
  • Auditability: Every match can be traced back to a specific line of code, commit hash, or error signature.
The Promise: Silence > Hallucination

The primary failure mode of "AI Ops" tools is Alert Fatigue. By surfacing "likely" root causes, these tools force engineers to spend cognitive energy debunking false positives. Déjà optimizes for Signal-to-Noise Ratio over Recall.

Competitor Approach (Probabilistic):
Input: Error: NullPointer in auth.ts
System Action: "This looks 85% similar to a bug in login.ts."
Result: The engineer investigates login.ts, realizes it's unrelated, and loses trust in the tool.
The Déjà Approach (Deterministic):
Input: Error: NullPointer in auth.ts
System Action: No exact match found in the Knowledge Graph.
Result: Silence. The engineer debugs manually. Trust is preserved for the next incident where a proven match is available.
Git Identity vs. Vector Similarity

A core challenge in institutional memory is File Identity: keeping track of a file as it moves, renames, or refactors over time.

  • Probabilistic Approach: semantic vectors remain "close" after moves; risk: false positives when distinct files share similar logic.
  • Deterministic Approach: Déjà uses Git Ancestry Parsing (Lane 1 Ingestion).
  • Move Detection: if git log --follow reports rename (R100), we link the new path to the old path as an Alias in the Knowledge Graph.
  • Copy-Paste-Delete boundary: if Git history is broken, Déjà treats it as a new unrelated file. Without an explicit Git link, assuming identity requires guessing. We do not guess.
1.2 Dual-lane ingestion architecture
Time travel + real-time
The Concept: Immediate vs. Infinite

Effective institutional memory requires two opposing capabilities: the ability to recall events from years ago (Deep Context) and the ability to react to a crash happening right now (Low Latency).

To achieve this without sacrificing performance, Déjà decouples ingestion into two distinct, parallel architectures: Lane 1 (Historical) and Lane 2 (Real-Time). Both lanes feed into a shared Normalization Engine, ensuring that a bug fixed two years ago generates the exact same fingerprint as a bug occurring today.

Dual-Lane IngestionLane 1 + Lane 2 → Shared Normalization
LANE 1Historical Backfill ("Time Travel")LANE 2Real-Time Webhooks (Sentry/Datadog)NORMALIZATIONEntropy GateAnchor Frame LogicPath SanitizerCANONICAL INDEXCanonical Path IndexKnowledge GraphDeterministic Matcher
Lane 1: Historical Backfill ("Time Travel")
  • Depth: configurable scan of the default branch history (Standard: 730 days / 2 years).
  • Commit Traversal: walks the Git tree in reverse chronological order.
  • Artifact Extraction: identifies Resolution Artifacts (PRs referencing issues: "Fixes PROD-123", "Resolves #402").
  • Diff Analysis: parses file diffs to map which files changed to fix which issues.
  • Performance: high-throughput batch lane; completeness over latency.
Lane 2: Real-Time Webhooks (The "Live" Stream)
  • Ingestion Source: HTTP webhooks from monitoring tools.
  • Latency SLO: <120ms from receipt to indexing.
  • Hostile input stance: applies the Entropy Gate to reject un-hydrated/minified traces before indexing.
  • Sanitization: strips environment-specific prefixes (e.g., /var/www/, webpack://) to match canonical paths.
Synchronization: The Canonical Path

The critical innovation is that normalization happens downstream of the lanes. If Lane 1 reads src/utils/auth.ts from Git history, and Lane 2 receives webpack:///./src/utils/auth.ts from Sentry, they must resolve to the same identifier.

Result: a crash in production (Lane 2) will mathematically collide with a fix from 18 months ago (Lane 1), triggering a deterministic "Recall" event. If a file was renamed, Lane 1's Git History graph links legacy paths to modern paths.
1.3 Predictive immunity (network effects)
Claim 15 • Cross-service broadcasting
The Mechanism: Cross-Service Broadcasting

Traditional incident response is isolated: a bug is fixed in Service A, but the knowledge remains trapped in that repository. Déjà implements Predictive Immunity (Claim 15): invert the workflow from "Crash → Fix" to "Fix → Broadcast."

Claim 15: Broadcasting a preventative alert to a distinct service in response to detecting an early-stage pattern matching a verified solution.
Propagation Logic: Dependency Graph Analysis

The immunity engine operates on a Dependency Graph constructed during Lane 1. It parses manifests (package.json, go.mod, Cargo.toml, pom.xml) to map producer/consumer relationships.

  • Trigger: Validation verifies a fix in a producer repo (auth-client-lib fixes TokenRotation leak).
  • Traversal: query consumers that import auth-client-lib.
  • Version match: check if consumers run a vulnerable version (pre-fix).
  • Broadcast: send predictive alert referencing the verified fix.
Vulnerability States
StatusDefinitionMeaning
ExposedService runs code containing a known verified defect pattern or imports a vulnerable dependency version.No remediation detected.
PatchedService applied the specific code change linked to the verified solution (Git history confirms fix commit or version bump).Safe state via explicit patching.
ImmunizedService moved from Exposed to Patched without ever generating an incident event.Outage prevented; drives "immunization rate" KPI.
2. The normalization engine
Stable fingerprints

How we turn raw, noisy stack traces into stable fingerprints.

2.1 The "entropy gate" (input filtering)
Zero tolerance
The Rule: We do not process un-hydrated code

The engine will not process, index, or store stack traces that reference minified or obfuscated code. Un-hydrated traces are treated as "Data Pollution." If a payload arrives without Source Map expansion (Sentry/Datadog), it is rejected at the edge. We do not guess the original source path.

Rejection Criteria
  • Minified artifacts: files matching short randomized patterns (e.g., [a-z]{1,2}\.js) like a.js.
  • Hashed filenames: [a-f0-9]{8,}\.js (chunked bundles: main.8f9a21b4.chunk.js).
  • Vendor artifacts: paths strictly within node_modules/, webpack/runtime/, [native code] (unless whitelisted).
Status Codes: When >5% of incoming events for an integration are rejected, the dashboard marks that integration as Degraded: Source Maps Missing. The webhook is active; payloads are unusable.
Remediation
  • Verify Source Maps are generated during CI/CD builds.
  • Upload Source Maps to the provider before deployment activates.
  • Confirm release version (SHA) matches the uploaded artifacts.
2.2 Anchor frame logic (stack traversal)
First actionable frame
The Algorithm: Select the First Actionable Frame

Raw stack traces contain many irrelevant frames. Fingerprinting the top frame blindly leads to unstable matches. The engine traverses frames F0 → Fn and applies filters:

  • Skip vendor/middleware: ignore node_modules/, vendor, stdlib internals.
  • Skip minified/garbage: ignore frames rejected by the Entropy Gate.
  • Anchor selection: stop at the first hydrated User Code frame.
Peeking Heuristics (The "Black Hole" Defense)

Shared utilities (e.g., src/utils/*) can cause under-segmentation. If the anchor matches a generic utility pattern (*/utils/*, */lib/*, */helpers/*, */shared/*), the engine "peeks" at the caller frame and uses compound anchoring: Caller + Anchor.

Day 0 Safety: Active by default. The engine prefers over-segmentation (splitting similar bugs) over under-segmentation (grouping unrelated bugs).
2.3 Compound fingerprinting
Hash composition
The Hash

Identity is defined by a mathematical collision. The canonical fingerprint is generated using SHA‑256:

Fingerprint
Fingerprint = SHA256(AnchorFrame + CallerContext + ExceptionInvariant)
  • Anchor Frame (Location): normalized file path + line (e.g., src/payment/Checkout.ts:42).
  • Caller Context: included only if peeking is active (utility anchor).
  • Exception Invariant: sanitized error template (UUIDs/timestamps removed).
Invariant Extraction (Message Sanitization)

Raw errors contain high-entropy segments (UUIDs, timestamps, memory addresses). The engine replaces dynamic values with placeholders via regex sanitizers prior to hashing to ensure stable grouping.

3. Integration specifications
Healthy pipeline prerequisites
3.1 Sentry / Datadog configuration
Source maps first
Requirement: The Source Map Race Condition

To pass the Entropy Gate, stack traces must be fully hydrated before they reach Déjà. A common CI/CD misconfiguration reverses the order: errors are reported before maps are processed, leading to minified garbage being rejected.

The Golden Rule: Artifact uploads MUST complete before deployment activation.
Correct order: Build → Upload Source Maps (wait success) → Deploy.
Sanitization: PII & Data Scrubbing

Déjà is an infrastructure metadata tool, not a user analytics tool. While Déjà drops payloads containing obvious PII patterns, best practice is to scrub upstream.

Sentry beforeSend (example)
Sentry.init({
  beforeSend(event) {
    if (event.request && event.request.data) {
      event.request.data = "[Redacted]";
    }
    if (event.user) {
      delete event.user.email;
      delete event.user.ip_address;
    }
    return event;
  },
});
Webhook Setup & Custom Telemetry (Typed Schema)

Endpoint: POST https://ingest.deja.ai/v1/events
Auth: Authorization: Bearer <INGESTION_KEY>
The in_app boolean is critical for Anchor Frame logic.

Custom telemetry payload schema
{
  "event_id": "uuid-v4",
  "timestamp": "ISO-8601",
  "platform": "node | python | go",
  "release": "git-sha-hash",
  "exception": {
    "type": "ErrorType (e.g., TypeError)",
    "value": "Sanitized Message Template",
    "stacktrace": {
      "frames": [
        {
          "filename": "src/utils/auth.ts",
          "function": "validateSession",
          "lineno": 42,
          "in_app": true
        }
      ]
    }
  }
}
3.2 Version control permissions
Least privilege
Tier 1: Read‑Only (Default)
  • Scope: metadata/read, contents/read, pull_requests/read (provider dependent).
  • Access required: commit hashes, PR titles/bodies, file diffs, git log history, file paths.
  • We do not access: .env files, repo settings/admin, unrelated assets; no full repo cloning.
Tier 2: Fix Agent (Optional)
  • Scope: contents/write + pull_requests/write for "Auto‑Create Patch" workflows.
  • Behavior: never pushes to protected branches; opens PRs on feature branches (e.g., deja/patch-*) for human review.
Security: No full clones. API-first fetching: diffs and metadata are processed in memory to generate fingerprints; no disk persistence of full source code. Integration tokens are encrypted at rest (AES‑256‑GCM) and short‑lived where supported.
4. Advanced configuration (.deja.yaml)
Tuning controls
4.1 Sentinel frames
Non-actionable paths

Purpose: manually mark internal libraries (e.g., src/middleware/logger.ts) as non-actionable to force the engine to look further up the stack. Sentinels prevent "Black Hole" incidents.

.deja.yaml • sentinels
normalization:
  sentinels:
    - "src/middleware/**"
    - "src/utils/http-client.ts"
Note on Day 0: You do not need Sentinels immediately. Peeking heuristics run by default. Configure Sentinels only if you see persistent under-segmentation.
4.2 Grouping rules
Consolidate fragmentation

Purpose: force two distinct fingerprints to merge (e.g., deprecating an old error type that is functionally identical to a new one). Use Case: "Day 100" optimization to clean up the timeline.

.deja.yaml • grouping
grouping:
  - target: "src/auth/NewLogin.ts"
    aliases:
      - "src/legacy/OldLoginController.js"

  - target_exception: "StandardError"
    alias_exceptions:
      - "LegacyError"
      - "CustomDbError"
Day 0–99: let the engine split aggressively (safer).
Day 100: merge duplicates caused by refactors and broken path ancestry using grouping rules.
4.3 Custom sanitizers
Dirty invariants

Purpose: regex rules to strip non-standard build artifacts or dynamic data from error messages that default sanitizers miss. This is the escape hatch for messy invariants.

.deja.yaml • sanitizers
sanitizers:
  - pattern: "TX-[A-Z0-9]{4}-[A-Z]+"
    replacement: "<tx_id>"

  - pattern: "shard-[a-z]+-[a-z0-9]+"
    replacement: "<shard>"
  • Be specific: avoid broad patterns that strip meaning.
  • Test regex: validate against real logs before deploying.
  • Order matters: custom sanitizers apply after built-ins but before hashing.
5. Security & compliance
CISO due diligence
5.1 Data isolation
Tenant separation
  • Row-Level Security: PostgreSQL RLS policies scope all queries to workspace_id.
  • Cache/queue namespacing: keys prefixed by tenant scope to avoid cross-tenant processing.
  • Data residency: US-East-1 default; optional single-tenant EU deployments (eu-central-1) for residency needs.
  • Retention: knowledge graph metadata retained for workspace lifetime; raw webhooks retained 7 days for replay/debug then deleted; diff contexts bounded (e.g., 30 days) then purged.
5.2 The "no training" guarantee
Contract + architecture

Contractual commitment: Customer Data is never used to train, fine-tune, or improve foundation models for Déjà or any third party. Technical implementation: the deterministic engine uses algebraic hashing, not probabilistic inference.

  • No neural weights: no model to learn from your code.
  • Ephemeral processing: diffs processed in memory to generate hashes, then discarded.
  • Metadata formats: store file paths/line numbers/invariants, not contiguous corpora for LLM training.
5.3 Source code privacy
Metadata only
  • We store: file paths, commit hashes, function signatures, line numbers, bounded diff fragments.
  • We do not store: full file contents, .git directories, full history, unrelated assets.
Encryption: AES‑256 (At Rest) and TLS 1.2+ (In Transit). HSTS enforced on public endpoints.
6. The validation orchestrator
Proves resolutions
6.1 The rate gate (Claim 1)
Statistical validation
The Logic: Validation is not binary; it is statistical

Déjà does not trust human intent (merging). It trusts system behavior (telemetry). The Rate Gate compares a fingerprint's error rate across a pre-merge window and a post-merge window.

Rate Gate formula
ΔE = (Rate(W_pre) - Rate(W_post)) / Rate(W_pre) * 100
Rate(W) = fingerprinted_errors / total_traffic
  • Threshold: default 95% (configurable).
  • Soak Period: default 24 hours; must remain down for full cycle to be Verified.
  • Traffic normalization: errors per request, not raw counts.
The "2 AM Traffic Drop" defense: validation pauses when post-window traffic is below statistical significance (e.g., <1000 req/hr), then resumes when volume returns. Silence is not mistaken for success.
6.2 Revert detection (Claim 10)
-100 penalty

The Revert Penalty monitors candidate solutions for 72 hours post-merge. Reverts are the strongest signal of a failed solution.

  • Detection: explicit reverts ("Revert ..."), force pushes removing indexed commits, hard resets.
  • Penalty: apply Confidence Penalty (-100) to that solution signature.
  • Outcome: incident re-opens; solution marked "Failed Attempt" to prevent repetition.
UX warning example: "⚠️ A fix for this issue was attempted on [Date] but was reverted. See failed PR #405 to avoid repeating the mistake."
6.3 Confidence scoring
Visibility tiers

Composite Score (0–100) quantifies reliability: Score_total = S_location + S_invariant + S_validation.

  • Location: +40 anchor match; +10 caller context match.
  • Invariant: +30 exception type + sanitized message match.
  • Validation: +20 prior rate gate pass; -100 if ever reverted.
TierThresholdBehavior
Silence< 70%No suggestion shown (preserves trust).
Possible Match70–90%Displayed but requires human confirmation.
Verified Match> 90%Auto-displayed as known root cause.
7. The incident lifecycle
Finite state machine
Incident State MachineSystem-driven transitions
INGESTINGOPENCANDIDATEVALIDATINGVERIFIEDREGRESSEDIMMUNIZED
State Definitions
  • Ingesting: signal received, awaiting normalization (<120ms), then dropped or normalized.
  • Open: active incident, no known fix detected.
  • Candidate Detected: PR merged touching anchor frame file(s).
  • Validating (Soak): rate gate active; pauses during low traffic.
  • Verified: rate gate passed; becomes institutional memory.
  • Regressed: verified fingerprint reappears ("Zombie Bug").
  • Immunized: predictive alert patched before crash occurred.
8. Troubleshooting
"Why didn't it match?"

Because "it just works" is a lie in distributed systems.

8.1 Debugging "misses" (false negatives)
No match found
Check 1: Entropy rejection

Cause: stack trace contained minified artifacts; Entropy Gate rejected. Fix: upload source maps before deploy; confirm release tags.

Check 2: Sentinel interference

Cause: Sentinel rules too broad, skipping true anchor frame. Fix: refine patterns (avoid swallowing business logic).

Check 3: The "Black Hole"

Cause: generic utility anchor + dynamic message bypassed sanitizers. Fix: add sentinel for utility or add custom sanitizer to stabilize invariants.

8.2 Debugging "bad matches" (false positives)
Wrong fix shown
The "Monolith" problem (low location entropy)

Cause: anchoring on a massive generic file (e.g., app.ts) collapses context. Fix: add file to Sentinels; force deep-linking; long-term refactor.

The "Generic Error" problem (low invariant entropy)

Cause: exception messages are too generic (e.g., "Failed"). Fix: throw typed errors with descriptive messages to increase invariant entropy.

Documentation Complete.
You now have a comprehensive System Specification covering physics, architecture, configuration, security, validation, lifecycle, and troubleshooting.