Adversarial Code Review Example with Claude

Charles Herring 18 June 2026

Below you will find my adversarial review prompt that I give to Claude on every pull request review. This is a living document that references dozens of cascading documentation. Collectively, over 1,700 norms contained in the "WitFoo Way" are reviewed. Also, the integrity of the work done by what is often a lazy or dishonest Claude session. In spite of the rigorous testing I outlined in Coding with Claude: What I've Learned Building a Cybersecurity Platform with AI , Claude finds new innovative ways to come off the rails. I use this prompt in an independent session to find shortcuts and lies from the PR session. I have run this prompt dozens of times and it has never come back without hits. As Claude finds new ways to bypassing checking or this prompt, I expand testing, norms and this prompt to cover those edge cases. As you can tell by it's length, there have been MANY iterations. While this prompt will not directly import into your projects, I thought it would be helpful to leave my details in place to assist in drafting your own. Turns out you have to fight Claude with Claude.

        # Adversarial Review
    

        Reusable post-completion review prompt. **A PR was just declared "done." Your job is
    
        to disprove that.** Operate as a hostile, skeptical reviewer who assumes the PR is
    
        secretly broken, half-wired, over-claimed, or deviating from WitFoo norms — and who
    
        sets out to *prove* it, file by file, claim by claim. Save findings to
    
        `_prompts/audit.md` for correction.

        > **Invocation:** "`PR{N}` is complete. Run an adversarial review." Substitute the PR
    
        > number/branch below for `{PR}` and `{BRANCH}` (default `{BRANCH}` = current branch,
    
        > base = `main`).

        ***Mandate***

        - Assume nothing is done until you've seen the line that does it **and** the test that
    
          proves it. The PR writeup, the `CLAUDE.md` one-liner, and the changelog entry are
    
          *claims*, not evidence.
    
        - This is **PR-scoped and diff-driven**, not a whole-codebase sweep. For the holistic
    
          rule-by-rule audit use [`_4 Code Audit and Review.md`](_4%20Code%20Audit%20and%20Review.md);
    
          for the standards checklist use [`_0 Write PR.md`](_0%20Write%20PR.md). This prompt
    
          attacks *what this PR changed* and *what it claims*.
    
        - Time is not a constraint; thoroughness is. Ultrathink each finding. But **cry wolf and
    
          you're useless** — every finding must be reproducible and survive the false-positive
    
          calibration in the appendix.
    
        - Take ownership of *all* defects you surface, including pre-existing ones in the blast
    
          radius. Document them; don't fix in this pass unless asked.

        ***Deliverable:*** `_prompts/audit.md` — a severity-ranked findings file (format in
    
        §Output) plus a "could-not-verify" honesty section, plus the captured result of
    
        `full-testing-with-conductor.sh`, plus a coverage-gap analysis.

---

        ## How to operate (the adversarial loop)

        1. **Establish the blast radius and the claims** (Phase 0). You cannot review what you
    
           haven't scoped.
    
        2. **Attack claim-vs-reality first** (Phase 1) — this is the single highest-yield axis in
    
           this repo. Most "shipped" defects here are *no-op handlers, wrong-twin edits, wiring
    
           gaps, and order-of-construction bugs* that compile, pass a unit test, and do nothing
    
           live.
    
        3. **Hunt deferred/stub/dead work** (Phase 2).
    
        4. **Sweep the WitFoo-Way dimensions touched by the diff** (Phases 3–8). Skip dimensions
    
           the diff doesn't touch; go deep on the ones it does.
    
        5. **Analyze coverage** (Phase 9) — the user always wants this; `coverage.sh` lies, so
    
           measure directly.
    
        6. **Run the pipeline** (Phase 10) — `full-testing-with-conductor.sh`, capture every
    
           failure.
    
        7. **Write `audit.md`** — including what you could *not* verify and why.

        Throughout, prefer **mutation-thinking**: for any "fixed/added" claim, ask *"if I
    
        reverted this line, would a test go red?"* If no test would catch the revert, the fix is
    
        unprotected (and may already be dead code).

---

        ## Phase 0 — Scope & claim extraction (do this first)
    

        ```bash
    
        git fetch origin main
    
        git --no-pager diff --stat main...HEAD          # full blast radius
    
        git --no-pager diff --name-only main...HEAD     # file list — every file must map to a claim
    
        ```

        - **Locate the contract.** Read the PR prompt in `_prompts/PR{N}*.md` and the tracker
    
          `CURRENT_WORK.md` (root, or `_archives/pr{N}*/CURRENT_WORK.md`). Extract: the Mission
    
          Contract exit criteria, the **Non-Goals/scope exclusions**, and the **Change Manifest**
    
          table (`File | Reason | Diff Type | Test Coverage | WitFoo Way Rules`,
    
          per [`_26 Change Manifest.md`](_26%20Change%20Manifest.md)).
    
        - **Reconcile Manifest ↔ diff (both directions):**
    
          - Every changed file **must** appear as a Manifest row. Files not in the Manifest =
    
            undocumented change / scope creep — challenge each.
    
          - Every Manifest row **must** correspond to a real diff. Rows with no diff =
    
            aspirational / claimed-but-absent.
    
        - **Extract behavioral claims.** From the PR writeup, the `CLAUDE.md` one-liner, and the
    
          `docs/PR_CHANGELOG.md` entry, list every verb that asserts behavior — *wired, fixed,
    
          now calls, persists, materializes, self-heals, gated, verified live*. Each verb names a
    
          specific line you must find. Keep this list; Phase 1 checks every item.
    
        - **Honesty check** (from [`_2 Honest Check.md`](_2%20Honest%20Check.md)): "Are there any
    
          uncompleted or deferred tasks in this PR?" Then *prove* the answer with greps, don't
    
          trust the writeup.

---

        ## Phase 1 — Claim-vs-reality (highest-yield axis)
    

        This repo has duplicate/parallel implementations and silent wiring seams. A change can
    
        compile, pass its unit test, and do nothing in the running binary. For **every** claim
    
        from Phase 0, find the implementing line **and** prove it's on the live path.

        ### 1.1 No-op handlers (returns 200/202, never calls the engine)
    

        ```bash
    
        # A handler that returns success without invoking its injected service/engine
    
        grep -rn 'func (h \*.*Handler).*echo.Context\|func (h \*.*Handler).*gin.Context' $(git diff --name-only main...HEAD | grep '\.go$')
    
        ```
    
        For each new/changed handler: confirm the injected service/engine field is actually
    
        *referenced in the method body* and the result is persisted. **Red flag:** an injected
    
        field (`engine`, `repo`, `store`) never read in the method. *(Real: PR359 `ManualExecute`
    
        returned 202 and never called the engine; the injected engine field was unused.)*

        ### 1.2 Wiring gaps (constructor exists ≠ handler mounted)
    

        ```bash
    
        grep -rn 'RegisterRoutes\|:= New.*Handler\|NewProxy\|NewService' api/cmd/main.go incident-engine/cmd/main.go
    
        ```
    
        A handler constructed nowhere, or registered on a sub-router that's never `.Use`d/mounted,
    
        404s live. Confirm every handler the PR claims is **constructed AND mounted**.
    
        *(Real: PR373 #215 `AttachmentProxyHandler` constructed nowhere → every attachments call
    
        404'd since PR359e.)*

        ### 1.3 Order-of-construction (dependency built after its consumer runs)

        A repo/builder constructed *after* the scheduler that consumes it is a silent no-op.
    
        Confirm dependencies of any `.Start()` / `.PreGenerate` / `.Run()` are wired **before**
    
        that call in `cmd/main.go`. *(Real: PR360e — three audit repos constructed after
    
        `reportScheduler.Start`, so the startup audit report silently never generated.)*

        ### 1.4 Wrong-twin / dead-package edits
    

        The repo has parallel implementations: `api/internal/repository` **vs** `pkg/repository`;
    
        IncidentAnalyzer's `CassandraGraphNodeRepository` **vs** graph-processor's `NodeManager`;
    
        `pkg/repository` `FrameworkRepository` **vs** the `api/internal/handler` subset. A fix on
    
        the wrong twin compiles, passes a unit test on that twin, and does nothing live.

        ```bash
    
        # Is the edited package actually imported by the running binary?
    
        go list -deps ./cmd 2>/dev/null | grep -E 'api/internal/repository|internal/repository|<edited-pkg>'
    
        ```
    
        If an edited package is **not** in `go list -deps ./cmd`, it's vestigial. *(Real: PR375c
    
        — a `node_id` filter added to `api/internal/repository`, never imported by `./cmd`, while
    
        the live `pkg/repository` query builder never got the clause. Real: PR356 — PR353's
    
        partial-PK sweep missed IncidentAnalyzer's second graph-node repo → every analyzer
    
        `GetByID/Update` errored under graph-v2 while the changelog claimed graph-v2 done.)*

        ### 1.5 Nil-dependency no-ops
    

        ```bash
    
        grep -rn 'NewSyncService\|ProductStore\|nil,' incident-engine/cmd/main.go api/cmd/main.go | grep -i 'nil'
    
        ```
    
        A service handed a `nil` collaborator silently no-ops. *(Real: PR358d — api passed `nil`
    
        ProductStore to `meta.NewSyncService` since PR20, so `SyncProducts` was a no-op on every
    
        appliance for ~338 PRs while appearing wired.)*

        ### 1.6 Deployed-binary confirmation (when claims are "verified live")
    

        If a claim says "verified live on soc" but the change is untested locally with no
    
        artifact, treat verification as **unproven**. Locally, confirm the new path is compiled
    
        into the running binary, not just an unimported package:

        ```bash
    
        ( cd <module> && go build -o /tmp/bin ./cmd && strings /tmp/bin | grep -i '<distinctive-new-string>' )
    
        go list -deps ./cmd | grep '<new-pkg>'
    
        ```

        ### 1.7 Status-enum / validator lockstep
    

        A new enum value or status added in one validator but not its twin is a latent acceptance
    
        bug. If the PR touches incident status, permissions, or any dual-validated enum, confirm
    
        **both** validators (and the reporter sets, metrics, UI label/Tag/facet/dropdown, and all
    
        7 locales) widened together. *(Real: PR357/PR354a `StatusOpen(6)`.)*

---

        ## Phase 2 — Deferred / stub / dead-code detection
    

        ```bash
    
        grep -rniE 'TODO|FIXME|HACK|XXX|stub|not implemented|coming soon|placeholder|for now|in (a )?future PR|deferred|advisory|no-op|nyi|mock data|USE_MOCK_DATA|disclaimer' $(git diff --name-only main...HEAD)
    
        ```
    
        - A stub is legitimate **only** if the Mission Contract Non-Goals declared it. Otherwise
    
          it's undisclosed deferred work. *(Real: PR22c shipped `ReceiverService` methods as
    
          `// TODO: Implement … in future PR`; PR22i shipped mock retry-distribution with a UI
    
          disclaimer.)*
    
        - **Dead UI affordances:** a `<Button on:click={fn}>` whose `fn` is a stub/returns; a tab
    
          that renders an empty state for a "complete" feature; a form posting to a missing route.
    
          *(Real: PR359 removed a dead Execute button.)*
    
        - **Silent-skip guards added in this PR:** any new `if … { return nil }` / `continue` on
    
          an error or already-handled branch. Confirm it (a) advances/dequeues state so the same
    
          item isn't reprocessed forever, and (b) logs at WARN — a guard that neither logs nor
    
          changes state is a swallow. *(Real: PR375g — `aiState==completed` returned without
    
          dequeuing the relic → one completed task froze all 17 others; 0 processed.)*
    
        - **"Enabled but never re-read" caches:** a version/snapshot counter incremented on
    
          mutation but never validated on read. *(Real: PR358h — `ParserRegistry.snapshotVersion`
    
          stamped but never read → runtime-enabled parsers reported running while the stale
    
          snapshot was iterated until restart.)* Grep for a counter with `++`/`.Add(1)` sites but
    
          no read/compare site.

---

        ## Phase 3 — Security & trust boundaries (if the diff touches auth/handlers/compose)
    

        *(Full rules: [`_0 Write PR.md`](_0%20Write%20PR.md) §Security and
    
        [`_4`](_4%20Code%20Audit%20and%20Review.md) Phase 1. Reproducers:
    
        `./scripts/testing/security-audit.sh` — **PASS = bug fixed, FAIL = bug present**.)*

        - **X-Org-ID strip-and-reinject.** Every new `api/` proxy handler must derive `orgID` from
    
          the JWT context (`proxyContext(c)` / `setIncidentEngineHeaders`), never forward the
    
          client header. `grep -rn 'Header.Get("X-Org-ID")' api/internal/handler/` → **zero** hits
    
          in proxy paths. *(F-b-001 cross-tenant.)*
    
        - **Middleware chain & per-route permission.** incident-engine groups chain
    
          `JWTMiddleware → OrgIDMiddleware → RequirePermission`, in that order. A new write route
    
          on a JWT-but-no-`RequirePermission` group is a privesc hole:
    
          ```bash
    
          grep -cE '\.(GET|POST|PUT|DELETE|PATCH)\(' incident-engine/cmd/main.go
    
          grep -c 'RequirePermission\|RequireAnyPermission\|RequireAllPermissions' incident-engine/cmd/main.go
    
          ```
    
          Investigate any *new* unguarded write group. (incident-engine `HasPermission` must
    
          *expand* admin/manage/write, not literal-match — a literal gate 403s every legit user;
    
          `IsPlatformAdmin` must stay strict-literal. *Real: PR363 reverted, PR366 fixed.*)
    
        - **JWT validators.** Any new `jwt.Parse`/`jwt.NewParser` must set
    
          `jwt.WithAudience(...)` + `jwt.WithValidMethods([]string{"HS256"})`. `aud` is
    
          hard-enforced; `iss` is tolerated-missing (Phase 5a window) — don't be reassured by an
    
          `iss` check alone.
    
        - **Secrets fail-closed.** Services reading `JWT_SECRET`/`AUTH_CONFIG_ENCRYPTION_KEY`/
    
          `AI_ENCRYPTION_KEY` must `log.Fatal` on empty **and** on the public default literal
    
          (`witfoo-precinct-secret-…`). Compose must use `${VAR:?required}`, never `${VAR:-default}`,
    
          and the **same** secret across api/incident-engine/dispatcher/reverse-proxy.
    
        - **SSO scoped lookup.** SAML/LDAP `getOrCreateUser` must use `FindByOrgIDAndEmail`, never
    
          global `FindByEmail`. *(F-b-003 cross-org takeover.)*
    
        - **Secret repos fail-closed.** Any new credential-bearing repo (notifications,
    
          integrations, connector creds) writing a `*_encrypted`/`secret`/`token`/`api_key` column
    
          must return `ErrEncryptionRequired` when no master key; **Create AND Update** both
    
          guarded. Response handlers must blank `APIKey` AND `APIKeyEncrypted` on **every** path.
    
        - **Fail-closed gates.** Tenant-status / license / feature gates return 503 (not `next(c)`)
    
          on lookup error with no cached value. License gates must be **live per-request**, not a
    
          cached boot snapshot. *(Real: frozen boot snapshot 403'd SOAR.)*
    
        - **Error/info leakage.** `grep -rn 'err.Error()' api/internal/handler/ incident-engine/internal/handler/`
    
          inside `c.JSON`/`gin.H{"error":…}`/`response.*` bodies — internal (Cassandra/NATS) errors
    
          must not reach the client. (Logging `err.Error()` server-side is fine.)
    
        - **Download/export header order.** All `c.Header(...)` (Content-Type, Content-Disposition)
    
          must precede the body write, or Go sniffs the body and can return `text/html` → stored
    
          XSS on download.
    
        - **WebSocket.** New `Upgrader.CheckOrigin` must not `return true` unconditionally; DM/
    
          private rooms must use `BroadcastToOrgUsers(memberIDs)`, never `BroadcastToOrgTopic`.
    
        - **Compose port exposure.** Internal services must bind `${LISTEN_IP:-127.0.0.1}:` —
    
          `grep -nE '^\s*-\s*"[0-9]+:[0-9]+"' docker/docker-compose*.yml` (bare host:container on an
    
          internal service bypasses reverse-proxy TLS/JWT/rate-limit). `go test ./tests/security/h/ -run F_h_003`.

---

        ## Phase 4 — Architecture / API-proxy / config / errors
    

        - **The 3-hop contract.** Every new incident-engine endpoint needs (a) a matching api
    
          proxy route wired in `api/cmd/main.go`, and (b) a UI `client.ts`/`$lib/api` caller. A
    
          backend endpoint with no proxy route 404s in the browser — *never* caught by Go build or
    
          unit tests. This is the most common half-done-feature signature in the repo.
    
        - **Outbound calls** to incident-engine use `setIncidentEngineHeaders` + `proxyContext`,
    
          never hand-rolled headers.
    
        - **Response envelope.** Two envelopes coexist (api Echo `{success,message,data,meta}` vs
    
          incident-engine raw Gin); `API_STANDARDS.md` documents a *third* idealized shape matching
    
          neither. **Read the actual handler, not the doc** — confirm the UI TS type matches what
    
          the proxy really returns (don't double-wrap).
    
        - **Background services.** `ticker + stopChan` with `done chan struct{}` (not WaitGroup),
    
          `Start(ctx)/Stop()/RunNow()`, with both `case <-ctx.Done()` and `case <-s.stopChan`. A
    
          new ticker with no stop arm leaks for process life. **Exception:** the incident
    
          aggregator (`INCIDENT_AGGREGATOR_INTERVAL_SEC`) must *sleep-after-cycle*, not use a
    
          coalescing `time.Ticker` (overlapping cycles saturate Cassandra — PR351).
    
        - **Cancellation paths** must wrap `ctx.Err()`, not `return nil` (a nil signals
    
          init-success → scheduler promotes → nil-deref; PR340 class).
    
        - **New env vars** route through `pkg/envconfig`, get a `docs/CONFIGURATION.md` row, and
    
          are declared in **every** consuming service's compose env block. *(Real: PR325 added
    
          `AI_ENCRYPTION_KEY` to api only; incident-engine read it and silently failed.)* Note
    
          `envconfig.GetEnvInt/Bool/Duration` **silently falls back to default** on a parse error —
    
          an operator typo applies the default with zero log signal.
    
        - **Errors:** `%w` not `%v` for wrapping (breaks `errors.Is/As` → status-mapping branches
    
          become dead code), generic client message + detailed server log.

---

        ## Phase 5 — Database / Cassandra / calculation correctness
    

        - **Partition scoping.** `grep -rn 'ALLOW FILTERING' pkg/repository/ incident-engine/internal/repository/ | grep -iv 'partition_key\|org_id\|shared_token'`
    
          — every hit must carry a partition key or be a known global exception. A missing
    
          partition key = cross-org full scan + tenant leak + SLO breach.
    
        - **`double` not `decimal`** (`grep -n decimal cassandra/init-schema.cql` → only in
    
          comments). **No `!=` in CQL** (parse error or silently ignored). **DDL-only init-schema**
    
          (`grep -niE 'INSERT|UPDATE' cassandra/init-schema.cql` → empty; DML lives in
    
          `cassandra/seeders/`, and an edited seeder source needs `rebuild-consolidated.sh`).
    
        - **Schema change ⇒ paired self-heal migration.** A new column/table/index in
    
          `init-schema.cql` needs an idempotent `EnsureX` (`ADD … IF NOT EXISTS`) in
    
          `pkg/repository/migrations.go` that is **CALLED** from **both** `incident-engine/cmd/main.go`
    
          **and** `graph-processor/cmd/main.go`. Without it, existing appliances NULL-read forever
    
          (init-schema only runs on fresh installs). **No** files under `cassandra/migrations/`
    
          (archived PR184). An `EnsureX` that stamps a version before verifying the real column/
    
          `bucket` state is the PR353 race.
    
        - **Graph-v2 PK.** Any graph read with only `WHERE partition_key = ?` (missing
    
          `node_type`+`bucket` / `bucket`) **errors at runtime** under v2 — resolve type+bucket via
    
          `node_locator`/`graphid.NodeBucket` first. Won't fail `go build`; needs a v2 integration
    
          or live check.
    
        - **Concurrent set columns** (`product_ids`/`framework_ids set<int>`) use atomic
    
          `SET col = col + ?` (AppendNodeLicensing), never read-modify-write.
    
        - **REPLACE BulkUpsert** must `DELETE` (Exec) *before* the INSERT batch — a single batch
    
          shares one timestamp, tombstone shadows inserts → empty table → compliance silently 0%.
    
        - **Calculation correctness** (if the diff touches reporter/compliance/FTE/ROI math):
    
          FTE = `totalHours / (2080 * reportDays/365)` — no `8760`/`525600` (Precinct legacy,
    
          ~4.2× wrong); TP = Disrupted(5)+Resolved(3); FP = Closed(4); active/detected =
    
          Open(6)+Investigating(2) (NOT New(1)); thresholds Green>75 / Yellow 0–75 / Red 0;
    
          Manual = −1. Use status-ID **constants** (3↔5 are swapped vs Precinct). Any new
    
          ratio/percent without a `== 0` denominator guard propagates NaN/Inf into report JSON.
    
          Changed formula ⇒ update `CALCULATION_METHODOLOGY.md` + the ChartHelpTip + all 7 locales.
    
        - **Perf cliffs that pass dev tests:** new `ALLOW FILTERING` without SAI index/LIMIT;
    
          O(controls×nodes) compliance loops (memoize `countNodesByProduct`); per-flag
    
          `countNodesWhere` scans (PR351 single-scan). Dev data (~33k entities) won't expose what
    
          soc (~114k nodes) will; dev SLO thresholds are 3× looser than prod.

---

        ## Phase 6 — Frontend / Svelte 5 / disconnected / branding / i18n / demo

        *(Patterns: [`_22`](_22%20Svelte%205%20Reactivity%20Advanced.md),
    
        [`_20`](_20%20Pagination%20Facets%20Histogram.md), [`_27`](_27%20Monaco%20Editor%20Integration.md),
    
        [`_28`](_28%20FormData%20and%20Fetch%20Patterns.md).)*

        - **Base path.** `grep -rn "goto(\`/\|goto('/\|href=\"/" ui/src | grep -v '\${base}'` —
    
          internal nav must prepend `${base}` from `$app/paths` (app mounts at `/ui`). Works in
    
          `npm run dev`, **404s behind the reverse proxy** — test the proxied path. `/api/*` and
    
          `/conductor/*` fetches are deliberately root-mounted (don't flag those).
    
        - **Reactivity.** `$effect` that writes `$state` it also reads → infinite cascade (wrap
    
          self-writes in `untrack()`, or use `$derived`/`$derived.by()`). Arrays/objects holding
    
          `m.key()` i18n calls must be `$derived.by`, not `const` (else frozen at boot locale).
    
        - **Stale-data.** New refresh paths must not clear `dataWindow`/facets/histogram to `[]`
    
          before an await — snapshot `staleDisplayData` and render via `$derived` while
    
          `isRefreshing` (flash-to-empty only shows at live stream rate).
    
        - **Faceting/pagination/histogram** are **3 independent server-side queries** — any
    
          client-side `calculateFacets(dataWindow)`/`calculateHistogram` outside the
    
          `USE_MOCK_DATA===true` branch is the Splunk anti-pattern (wildly wrong at ~300/min).
    
          Confirm `USE_MOCK_DATA` is not left `true`.
    
        - **`{#each}` keying** `(item.id)` for any sortable/filterable list (not the ~204
    
          intentionally-static menus). **Carbon:** `resizable: false` on every new Chart
    
          (ResizeObserver leak ~27 listeners/tab-switch); `on:click` on Carbon components vs
    
          `onclick` on native `<button>` (mismatch = silent dead handler).
    
        - **Monaco/Cytoscape** dynamic-import in `onMount`, dispose model **and** editor in
    
          `onDestroy`. **FormData** uploads must not set Content-Type; empty POSTs send `{}`; use
    
          the `client.ts` wrapper (raw `fetch` drops X-Org-ID/auth).
    
        - **Disconnected network (MUST).** No new runtime external URL. `cd ui && npm run build`
    
          **then** `scripts/util/audit-external-deps.sh` (it scans `ui/build` — a stale build
    
          silently passes). `fonts.css` import must precede Carbon CSS. New map/flag/geo features
    
          need a source-scan test (cf. `WorldMapHeatmap.test.ts`).
    
        - **i18n / branding / demo.** New user-facing strings are Paraglide `m.key()` present in
    
          **all 7** locale files (`en/es/fr/de/ja/ar/mi`) — a key only in `en.json` renders the raw
    
          key elsewhere with no error. No new hardcoded `WitFoo`/`witfoo.com` in customer chrome
    
          (white-label). Demo-mode features gated **server-side**, not just `{#if !isDemoMode}`.

---

        ## Phase 7 — Conductor / integrations / connectors (if touched)

        - **New parser = FOUR files.** Package dir + `connector.go` import + `parserRegistry` map
    
          entry + conductor-ui `frontend/src/lib/definitions/parsers/types.ts`. Verify the new
    
          parser name appears in **all four**:
    
          ```bash
    
          ls -d conductor_suite/signal-parser/pkg/connector/*/ | wc -l
    
          grep -c 'ParserFactory:' conductor_suite/signal-parser/pkg/connector/connector.go
    
          grep -c 'projectPrefix +' conductor_suite/conductor-ui/frontend/src/lib/definitions/parsers/types.ts
    
          ```
    
          (These counts already drift — 267/249/218 — so check the *specific* new name.) Missing
    
          the UI file = parser registers but never receives traffic (silently idle).
    
        - **Runtime KV-enable does NOT join the serial chain** — signal-parser must be **restarted**.
    
          A PR claiming "live enable" without the restart caveat (or without the validated
    
          `versionedSnapshot` read path) ships an inert parser. *(PR358h class.)*
    
        - **connector-exists ≠ parser-matches-format.** A wired, "running" connector whose parser
    
          doesn't recognize the message format drops records into the *unknown* stream silently —
    
          watch the Pipeline Sankey "Unknown" slice, not just container-up. JSON arrays decode to
    
          `[]interface{}` not `[]map[string]interface{}` (PR348 silent record-drop).
    
        - **conductor-ui dual-mode.** Proxy/AIO mode must NOT register standalone `/api/auth/*` /
    
          `/api/v1/users/*`; reverse-proxy strips inbound `X-Auth-*` before injecting;
    
          `proxy_auth.go` gates on `RemoteAddr` CIDR (not `RealIP()`) **before** reading `X-Auth-*`.
    
          AIO needs matching `JWT_SECRET`/`WF_JWT_SECRET` + `SecondaryNetworks` to conductor-net.
    
        - **FROM-scratch images** are pre-built binaries (`COPY bin/<svc>`); `conductor-build.sh`
    
          must run first; no shell-based `HEALTHCHECK` (dead). conductor images deploy via the
    
          **conductor repo's own tier branches**, not analytics CI. Stale `bin/` = container runs
    
          old code while source looks updated.
    
        - Conductor submodules are **excluded from analytics `unit-tests.sh`** — run
    
          `cd conductor_suite/<svc> && go test ./...` separately; a `common` change has cross-parser
    
          blast radius (run the **full** conductor `go test ./...`). The authoritative pipeline step
    
          list is the script's `conductor_steps=(…)` array (9 steps), not the stale 7-step doc table.

---

        ## Phase 8 — Supply chain / Docker / observability / WFA (if touched)

        - **No bare new Go `require`.** Each must have a `third_party/go/<module>` fork + a
    
          `go.work` replace (never per-module go.mod) + a `MANIFEST.md` row. Gate:
    
          `scripts/vendor/verify-vendor.sh` (GOPROXY=off build per module — but it can pass on a
    
          warm module cache; prefer clean). `scripts/vendor/check-no-external-deps.sh` on the
    
          go.mod diff. PATCHES.md↔MANIFEST.md via `check-manifest-patches-sync.sh`.
    
        - **Module-discovery scripts** (`find -name go.mod`) must exclude `*/third_party/*` and
    
          `*/_vendor/*`.
    
        - **Known doc-vs-code trap:** Dockerfiles run `go mod download` (online build) despite docs
    
          claiming a vendored offline build — don't trust an "air-gapped build" claim without
    
          actually building the image with no network.
    
        - **Observability.** A new metric needs both a `prometheus.yml` job and a registered
    
          `/metrics` endpoint; avoid high-cardinality labels (`incident_id`/`org_id`/raw path →
    
          series explosion / OOM) and duplicate names (`MustRegister` panic). A metric defined but
    
          never `.Inc()/.Observe()`d is dead. `remote_write` ships commented-out — "cloud metrics
    
          enabled" by env alone is inert. Don't regress `LOG_LEVEL=warn` / Alloy allowlist /
    
          `HEALTH_STATUS_INTERVAL` (`scripts/testing/verify-log-filtering.sh`).
    
        - **WFA.** New `WF_*` container env must flow through `EnsureContainersHaveCurrentEnv`
    
          (env-drift self-heal) or it's never reconciled. Never roll out via
    
          `systemctl restart wfad` (hangs the stack); never run `wfad --version` (spawns a rogue
    
          daemon). `analytics-nats` `store_dir` must be `/data/nats/analytics` (AIO collision).

---

        ## Phase 9 — Coverage gap analysis (always required)
    

        > **`scripts/util/coverage.sh` LIES.** Its header says it does *not* enforce thresholds; it
    
        > runs `go test … 2>/dev/null` (swallows compile errors as a soft warning) and reports a
    
        > **zero-test package as "no testable code" = PASS**. Never quote its green summary as proof.

        - **Every new logic file needs a test:**
    
          ```bash
    
          git diff --name-only --diff-filter=A main...HEAD | grep '\.go$' | grep -v _test.go \
    
            | while read f; do t="${f%.go}_test.go"; [ -f "$t" ] || echo "MISSING TEST: $t"; done
    
          ```
    
          (Schema/config/`cmd/main.go` exempt; business logic & API handlers never exempt.)
    
        - **Measure the touched module directly** (no `2>/dev/null`):
    
          ```bash
    
          cd <changed-module> && go test -short ./...                     # surfaces red/compile, not hidden
    
          go test -short -coverprofile=cov.out ./... && go tool cover -func=cov.out | grep -E '^total:|<changed-file>'
    
          ```
    
          Apply the **tiered** target, not a flat 75%: **85%** for security-critical (`pkg/sanitize`,
    
          `pkg/security`, `pkg/middleware`, `reverse-proxy/internal/{middleware,sanitize}`,
    
          `api/internal/middleware`, `artifact-ingestion/internal/middleware`); **60%** integration
    
          layers; **75%** default.
    
        - **Tests that prove nothing** (assert-nothing / test-theater):
    
          ```bash
    
          grep -rLE 'assert\.|require\.[A-Z]|t\.(Error|Fatal)' $(git diff --name-only main...HEAD | grep _test.go)
    
          git diff main...HEAD -- '*_test.go' | grep -iE 'TODO|FIXME|t\.Skip|not implemented'
    
          ```
    
          Flag: fixtures with N fields but 1 assertion (`go vet`/staticcheck unusedwrite);
    
          `assert.NoError` followed by a deref (should be `require`); E2E specs whose only outcome
    
          is `test.skip()`/`.catch(()=>false)` with no hard `expect`; Svelte `render()` component
    
          tests (known-broken under Svelte 5, quietly excluded); financial assertions via
    
          Playwright `innerText` (numbers render in SVG — false-negative-prone).
    
        - **Mutation check:** for each key new assertion, confirm it would **fail if the fix were
    
          reverted**. *(Real: PR375c's original test passed against dead code; it was moved to the
    
          live `pkg/repository` path.)*
    
        - **Known standing gaps** (don't accept as new baseline): incident-engine 28.6% /
    
          incident-engine/service 26.7% / dispatcher/subscriber 14.8% / api/handler 18.5%;
    
          graph-processor & dispatcher have **no** dedicated unit tests; PR214 deferrals
    
          (pkg/service, pkg/dns, pkg/notification, rp/proxy). A change here is unlikely to be caught
    
          by existing tests — demand explicit tests for the *changed function*.
    
        - **Workspace-wide vet** for any interface/submodule change (compiles in the edited module,
    
          breaks a mock elsewhere — PR358f):
    
          ```bash
    
          for d in . incident-engine api graph-processor conductor_suite/common; do (cd "$d" && go vet ./...); done
    
          ```

---

        ## Phase 10 — Run the full pipeline

        ```bash
    
        ./scripts/testing/full-testing-with-conductor.sh            # backgrounds; logs to logs/full-testing-with-conductor.log
    
        tail -f logs/full-testing-with-conductor.log                # monitor
    
        ```
    
        9 steps: `inspection → docker-rebuild → conductor-health → nats-bootstrap →
    
        generate-artifacts → conductor-inject → system-tests → e2e-tests → conductor-benchmarks`.
    
        Resume a failed step with `--from-step <step>`; isolate with `--only <step> --force
    
        --foreground`; `--skip-passed` re-runs only failures; view state with
    
        `scripts/testing/test-dashboard.sh`.

        - Capture **every** failure (step, exit code, the failing assertion/log excerpt) into
    
          `audit.md`. Take ownership of pre-existing failures too.
    
        - The pipeline is the **only** level that proves browser→api→incident-engine wiring and the
    
          broker→parser→filter→exporter→ingestion path — a "tests pass" claim on Go-only unit tests
    
          is not coverage-complete.
    
        - If full Docker is unavailable, run the cheaper gates and **say so** in the honesty
    
          section: `./scripts/testing/unit-tests.sh`, `./scripts/testing/security-audit.sh`,
    
          `./scripts/testing/linter.sh`.

---

        ## Calibration — avoid false positives (a credible reviewer)
    

        Don't flag these known-intentional patterns:
    

        - The literal default JWT secret (`witfoo-precinct-secret-…`) **present in source** — it's
    
          retained only to **reject** the value (verify it's in an `==` check, not a fallback
    
          assignment).
    
        - `FindByEmail` **existing** — it's intentional for local auth; the bug is only calling it
    
          from an SSO path (trace the call graph).
    
        - `if x == 0` float guards in `report_calculations.go` — intentional divide-by-zero
    
          sentinels (value is exactly 0 from integer math), not the float-compare anti-pattern.
    
        - ~204 unkeyed `{#each}` on static menus; `client.ts` skipping `/api` for `/incident-engine`
    
          / `/conductor`; the unauthenticated `/health`/`/ready`/`/metrics`/`/conductor/status`
    
          endpoints. (For `/v1/system/status`, *do* flag any **newly-added** tenant/error field.)
    
        - AstraDB mode ignoring pool-tuning env vars; the per-process AIExecutor rate limiter (not
    
          a distributed budget — flag only if a PR *treats* it as one).

        When unsure whether something is a real defect, **say so and show the evidence** rather
    
        than asserting — a finding the author can dismiss in 10 seconds discredits the whole audit.

---

        ## Output — `_prompts/audit.md`

        Write findings grouped by severity (P0 → P1 → P2), then by phase/dimension. Per finding:
    

        ```markdown
    
        ### [P0|P1|P2] <short title>

        - **Claim vs reality:** <what the PR claimed | what the code/test actually does>
    
        - **Location:** `path/to/file.go:42` (and the dead/live-path note if relevant)
    
        - **Evidence:** <the grep/command output or the missing line that proves it>
    
        - **Impact:** <failure mode — what breaks, for whom, at what scale>
    
        - **Fix:** <specific remediation>
    
        - **Reverts-would-fail-a-test?:** yes/no  (if no → unprotected)
    
        ```

        Severity: **P0** = security / cross-tenant / data-loss / claimed-but-no-op shipped as done.
    
        **P1** = WitFoo-Way MUST violation / missing self-heal migration / coverage regression on a
    
        security-critical pkg / silent perf cliff. **P2** = SHOULD/NOTE / doc-drift / cosmetic.

        End `audit.md` with:

        1. **Summary table** — counts by severity × dimension.
    
        2. **`full-testing-with-conductor.sh` result** — pass/fail per step, with failure excerpts.
    
        3. **Coverage gap analysis** — per touched module: measured % vs tier target, files missing
    
           tests, assert-nothing tests, mutation-check results.
    
        4. **Could-not-verify (honesty section)** — every claim you could not prove or disprove,
    
           and *why* (no Docker, needs a live appliance, needs soc-scale data, etc.). This
    
           section is mandatory — an adversarial review that hides its blind spots is itself
    
           over-claiming.

---

        ## Post-review
    

        - For any **P0/P1**, draft `_prompts/PR{N+1} - Audit Remediation.md` (TDD: failing test
    
          first).
    
        - If the review established a *new* recurring defect pattern, add it to
    
          [`_4 Code Audit and Review.md`](_4%20Code%20Audit%20and%20Review.md) and/or
    
          [`_0 Write PR.md`](_0%20Write%20PR.md).
    
        - Archive this run's `audit.md` into `_archives/pr{N}*/` alongside `CURRENT_WORK.md`.

Search

Adversarial Code Review Example with Claude

Tags