Local code review CLI

Reviews with receipts.

Most AI reviewers ask you to trust a paragraph. Warden starts with the boring evidence — type errors, lint findings, dependency records, source snippets — and only then lets the model rank, clarify, and phrase. Nothing reaches the comment without a cite.

Default review spine Leverage checks Security triage

$warden review --base main

okecosystem detected: TypeScript workspace

okdetectors ran: tsc, eslint, audit, security, leverage

oksub-agents checked: committability, libraries, security residue

oksources verified: repo snippets, OSV, package .d.ts

..synthesizing cited CommentSet

T2 contract 92%

packages/cli/src/index.ts:60

`warden check` still validates provider environment before any deterministic checks run.

T2 consistency 84%

README.md:41

Should the quick-start distinguish the optional semantic index from the required review path?

T1 correctness 96%

packages/core/src/banner/index.ts:53

An empty chunk store is treated as a missing index before model metadata is inspected.

Runs in your repono hosted review backend, no upload

Detectors before prosetype, lint, audit, dedup, leverage, security

Sub-agents askcommittability, library leverage, security residue

Verifier has vetounsupported snippets and API claims get cut

The thesis

A reviewer should know what it can prove.

Confidence is cheap. The discipline is what makes a review worth reading. Warden splits the work into three honest layers — evidence, verification, prose — and refuses to let the model skip ahead.

01 · EVIDENCE

The facts come first.

TypeScript, ESLint, duplication checks, dependency audits, context selection, leverage patterns, and Warden-managed security lint form the evidence layer. The review starts with things that can point at a file, rule, lockfile, or advisory.

02 · TRIAGE

Sub-agents stay scoped.

Cheap-tier LLM sub-agents handle the open-ended tails: committability questions, library substitution opportunities, and security residue that deterministic security lint cannot model cleanly.

03 · VERIFICATION

Claims earn sources.

Repository snippets are substring-matched. Vulnerability claims are checked against OSV. Library API claims use installed .d.ts files. If the source does not verify, the comment does not ship.

Default review

The review spine is explicit.

warden review is not the deep security harness. It is the everyday review path: deterministic producers, bounded sub-agents, one synthesizer, then the citation verifier.

  1. Diff gate

    Detect the ecosystem, choose the diff source, and prune generated or irrelevant subtrees before any runner spends time.

  2. Detectors

    Run TSC, ESLint, npm audit plus OSV, jscpd, context selection, consistency, deadcode, scalability, leverage, and security lint.

  3. Sub-agents

    Ask cheap-tier scoped questions for committability, library leverage, and security residue. Each source still has to verify.

  4. Synthesizer

    Let the boss model rank and phrase the surviving findings into a stable CommentSet.

  5. Verifier

    Drop citations that do not substring-match the repo, OSV record, or package type definition. Comments with no verified source are dropped.

Deep security is not the default surface.

The deep security harness is design-locked for warden security and warden review --deep, but it is intentionally deferred until the everyday review and init loops earn more dogfood signal.

Anatomy

What lands in a comment.

Every Warden comment is a small receipt: a tiered claim, a category, a confidence number, and the sources that earned it the right to interrupt you.

  • Tier
    Block · Fix · Consider

    Three steps of actionability, never inflated. Tier 1 is what would block a merge.

  • Category
    correctness, security, scalability, …

    Reading order, not severity. Leverage now sits before dedup because a library or stdlib swap can dissolve repeated code.

  • Sources
    Path, line, snippet — verified

    Every external claim must point at a real artifact. The verifier substring-matches before the comment ships.

  • Degraded
    What didn't run, said out loud

    If a worker fails, the run admits it. Silence is the failure mode Warden refuses.

Design record

Personal tool, public craft bar.

Warden is being built for one person first, but not as a throwaway. The decisions are written down. The vocabulary is named. The review page follows the shipped spine before it talks about deferred harnesses or future bot surfaces.