Software that's smarter every month
than the day you shipped it.

Dendra is a classification primitive that wraps any decision site in your codebase with a self-improving lifecycle: rule today, LLM in shadow tomorrow, a sub-millisecond ML head when evidence has earned it. The rule stays behind everything as the safety floor.

Python ships in v1.0. TypeScript, Go, Rust, and Java clients are v1.1 follow-ons, alongside a WASM build of the gate primitive — so every client will share one audit-chain format and one statistical contract.

8 / 8 Public benchmarks cleared the evidence gate.
< 2 ms · ~$0 Per call, once the ML head graduates in.
One decorator Zero rewrites. Your call site never moves.
Open core Apache SDK · BSL → Apache 2030 · patent-pending.

Already audited the eight LLM-broker libraries enterprise teams build on: LangChain, LlamaIndex, Haystack, AutoGen, CrewAI, DSPy, LiteLLM, Instructor. 10,889 files scanned, 919 classification sites surfaced. To re-derive on any of them, pip install dendra && dendra analyze /path/to/repo.

Real analyzer output, real codebases

See the classification sites Dendra finds in real OSS Python repos. And the LLM bill they'd retire.

Pick a repo. Drag the sliders. Watch the savings move. Every row below comes from the v1.0 analyzer running on the real source.

Pick a repo:
v1.1 Run live on any public GitHub repo

Live custom-repo analysis ships in v1.1 (Pyodide-in-browser; the analyzer fetches the repo from GitHub and runs in your tab — your code never touches our server). For now, pip install dendra && dendra analyze . runs the same scan on your machine in a few seconds.

Loading marimo…

file:line function pattern? labels? regime? fit?
Picking a preset…

If this repo's LLM bill looked like…

Sliders are your inputs; the savings number is what graduating to in-process ML heads recovers, weighted by how many of the detected sites are realistically Dendra-fit (score ≥ 3.0).

Per-call cost from your LLM provider

Click a provider to set the slider to a typical classifier-call cost at that rate.

Estimated annualized savings
Per day, post-graduation
Sites Dendra-fit (score ≥ 3.0)

Want to run this on your private repo? Same scan, same JSON, on your own machine in under a minute.

Install in 5 minutes → no upload · no signup · no GPU

5 minutes on your machine

Install Dendra and see your first classification site, locally.

The walkthrough below is the same path you'd run in your terminal right now. Steps 4 and 7 are optional (auto-lift and the MCP server); the other five are the core install path. Every command is copy-able. Every output snippet is from the v1.0 CLI, not a mockup.

  1. 1

    Install

    No GPU, no torch, no service deps. Sklearn-class only.

    pip install dendra
    Successfully installed dendra-1.0.0

    ~30s on a warm pip cache, up to a minute cold. uv pip install dendra works too if you've moved on.

  2. 2

    Find your classification sites

    Pure-Python AST scan. No network, no upload — your code stays on your machine.

    dendra analyze .
    Dendra static analyzer — classification sites
    ============================================================
    Root:           /Users/you/myapp
    Files scanned:  84
    Sites found:    7
    
    file:line                  function          ptn   labels   regime  fit
    ------------------------------------------------------------------------
    src/triage.py:14           triage_ticket      P1        3   narrow  5.0
    src/router.py:88           route_intent       P1        7   narrow  4.5
    src/moderation.py:31       classify_post      P1        3   narrow  4.5
    …
    
    By regime:
        narrow: 5
        high_card: 1
        unknown: 1
    
    Next step: dendra init <file>:<function> --author @you:team

    Runs in ~270 ms on 1,000 files. The same JSON shape the website demo uses; pipe to --json if you want it programmatic.

  3. 3

    Preview the wrapper

    See the AST-injected diff before you commit a single line.

    dendra init src/triage.py:triage_ticket --author @you:team --dry-run
    # would modify src/triage.py: 3 labels (inferred), phase=RULE
    --- src/triage.py (before dendra init)
    +++ src/triage.py (after dendra init)
    @@ -1,3 +1,10 @@
    +from dendra import ml_switch, Phase, SwitchConfig
    +
    +@ml_switch(
    +    labels=['auto_close', 'escalate', 'queue'],
    +    author='@you:team',
    +    config=SwitchConfig(phase=Phase.RULE),
    +)
     def triage_ticket(ticket: dict) -> str:
         title = (ticket.get("title") or "").lower()

    The body of your function is unchanged. Drop --dry-run when you're ready and the file is rewritten in place.

  4. 4

    Auto-lift the wrapper (optional)

    Extract per-branch handlers and hidden-state evidence into a Switch subclass, so the LLM/ML head can graduate.

    dendra init src/triage.py:triage_ticket --author @you:team --auto-lift
    # would modify src/triage.py and create __dendra_generated__/triage__triage_ticket.py
    --- src/triage.py (before)
    +++ src/triage.py (after)
    @@ -1,3 +1,5 @@
    +from __dendra_generated__.triage__triage_ticket import TriageTicketSwitch
    +
     def triage_ticket(ticket: dict) -> str:
    -    # original body unchanged
    -    ...
    +    return TriageTicketSwitch().dispatch(ticket).label
    
    +++ __dendra_generated__/triage__triage_ticket.py (new)
    @@ -0,0 +1,18 @@
    +from dendra import Switch
    +
    +class TriageTicketSwitch(Switch):
    +    def _evidence_title(self, ticket: dict) -> str:
    +        return (ticket.get("title") or "").lower()
    +
    +    def _rule(self, evidence) -> str:
    +        if "crash" in evidence.title or "outage" in evidence.title:
    +            return "escalate"
    +        if evidence.title.endswith("?"):
    +            return "queue"
    +        return "auto_close"
    +
    +    def _on_escalate(self, ticket): ...   # extracted side effect
    +    def _on_queue(self, ticket): ...      # extracted side effect
    +    def _on_auto_close(self, ticket): ... # extracted side effect

    --auto-lift adds the per-branch (_on_*) and per-evidence (_evidence_*) machinery the analyzer would otherwise flag as missing. Skip it on simple sites; reach for it once a site grows hidden state or branch-specific side effects.

  5. 5

    Run your code as usual

    No command. The wrapper does the work.

    Outcomes log silently to runtime/dendra/<switch>/ every time your app calls triage_ticket(...). Phase 0: the rule still decides; the switch just records. Hand the same record IDs to switch.record_verdict(record_id, Verdict.CORRECT) when downstream signals (CSAT, resolution code, human review) tell you whether the classification was right.

  6. 6

    See your ROI

    Self-measured from your own outcome logs. Knobs are exposed for your own assumptions.

    dendra roi runtime/dendra/
    Dendra ROI report
    ============================================================
    Sites tracked:        1
    Outcomes logged:      4,210
    Verified outcomes:    3,876  (92%)
    
    Per-site projection:
      triage_ticket   $1,200/mo low  -  $3,800/mo high
                      (eng-cost saved + LLM-bill avoided post-graduation)
    
    Total projected annual value: $14,400 - $45,600

    Override the assumption bands with --monthly-value-low / --monthly-value-high / --engineer-cost-per-week, or pipe --json into your own dashboard.

  7. 7

    Drive Dendra from your IDE (optional)

    Dendra ships an MCP server (stdio) so Claude Code, Cursor, and any MCP-aware client can drive the CLI directly.

    dendra mcp
    // ~/.config/claude-code/mcp.json (or your client's equivalent)
    {
      "mcpServers": {
        "dendra": { "command": "dendra", "args": ["mcp"] }
      }
    }

    Same 14 CLI verbs (analyze, init, roi, benchmark, report, graduate, verdict, ...), reachable as MCP tools. Your assistant calls dendra init and dendra benchmark for you, with deterministic output.

That's it. The same path runs on every classifier in your codebase.

Close the loop

Prove the savings on your data.

Dendra ships a benchmark + report harness that turns projected savings into measured savings on your repo, with your data. Run dendra benchmark <site> after every change to a switch (rule edit, label add, evidence tweak); dendra report rolls every recorded run up into a per-switch timeseries (phase transitions, cost per call, cumulative dollars saved).

The same JSON the report emits is what --json returns, so you can pipe it into whatever dashboard your team already runs. Nothing leaves your machine.

$ dendra benchmark src/triage.py:triage_ticket
3 passed in 0.04s
triage_ticket: first run (phase=RULE, cost $0.00002-$0.00032/call)

$ dendra report
Dendra report - 1 switch, 14 days of data

triage_ticket  RULE -> MODEL_PRIMARY 2026-05-04 (after 312 verdicts)
               cost: $0.0042 -> $0.00031/call (-92%)
               estimated saved this week: $128

Wrap your if/else. Walk away. Come back to a classifier paid in microseconds.

from dendra import ml_switch
from myapp.support import auto_close, queue_for_human, escalate_to_oncall

# Each label is paired with a downstream action.
# The decorator wires classification, dispatch, and outcome
# logging at the call site. The body is the exact if/else
# your team would have inlined anyway.
@ml_switch(labels={
    "auto_close": auto_close,
    "queue":      queue_for_human,
    "escalate":   escalate_to_oncall,
})
def triage_ticket(ticket: dict) -> str:
    title = (ticket.get("title") or "").lower()
    if "crash" in title or "outage" in title:
        return "escalate"
    if title.endswith("?"):
        return "queue"
    return "auto_close"

triage_ticket(ticket)  # classifies AND fires the matching handler

Zero behavior change on day one. Your team ships the exact if/else they would have inlined anyway. Dendra does the work that usually takes a six-month ML migration project: outcome logging, training, shadow evaluation, graduation. By month six, that same triage_ticket(ticket) call is paid in microseconds and fractions of a cent. Nobody touches the call site.

What you actually ship

An LLM bill that retires itself

Most production classifications eventually accumulate enough evidence to retire the LLM tier. Dendra is the substrate that earns it — automatically, with a statistical floor under every promotion. By month six you're paying microseconds and pennies, not seconds and dollars, for the same call.

Code that compounds

Every classification site is a wrapper that gathers evidence on its own clock. The decision quality improves with use. The migration to ML never blocks a sprint. The call site is permanent — only the brain underneath moves.

A rule still behind everything

Your hand-written rule is preserved structurally at every phase. When ML is uncertain, falls through to LLM. When LLM is uncertain, falls through to rule. Safety-critical sites cap at "ML-with-fallback" — no ML-primary for authorization, ever, by construction.

How it works

The mechanism, for the curious.

None of what follows is required reading to use Dendra — pip install dendra and the decorator above are sufficient. The rest of this page is for the engineer asking "yes, but how do you actually know when to graduate?"

Six phases. Three evidence-gated graduations. One rule floor.

Each phase routes decisions the same way every time. The lifecycle only graduates to the next phase once enough outcome evidence has accumulated to prove the upgrade is real, not a coincidence. The bar is conservative by default and the math is in the paper for those who want to verify it.

  1. RULE
    Your function decides.
  2. MODEL_SHADOW
    Rule decides; LLM watches.
  3. MODEL_PRIMARY
    LLM decides; rule is fallback.
  4. ML_SHADOW
    ML trains from outcomes.
  5. ML_WITH_FALLBACK
    ML decides; on uncertainty falls through LLM, then rule.
  6. ML_PRIMARY
    ML decides; circuit breaker reverts to rule on anomaly. safety_critical=True refuses this phase at construction.

Predecessor cascade. Each phase's low-confidence fallback is its predecessor's full routing. Phase 4 → Phase 3 → Phase 2 → Rule, in the order each tier was earned. Promotions add tiers; uncertainty walks them back down. The rule fires when every learnable tier above it is below threshold.

Drift detection rides along. The same evidence test runs in reverse: if accumulated outcomes show the rule has reclaimed the lead, the lifecycle demotes by one phase. Safety-critical sites cap at ML_WITH_FALLBACK — no ML-primary for authorization decisions, ever, by construction.

Eight public benchmarks across four text domains. All clear the gate.

Eight-panel chart: codelangs, ATIS, TREC-6, AG News, Snips, HWU64, Banking77, CLINC150. Each panel plots the hand-written rule's flat accuracy against the ML head's rising accuracy as outcomes accumulate. A dashed vertical line marks the first checkpoint where the evidence gate cleared.

Eight text benchmarks across intent classification (ATIS, HWU64, Banking77, CLINC150, Snips), question categorization (TREC-6), news topics (AG News), and programming-language detection (codelangs, including FORTRAN sourced from NJOY2016 nuclear-data processing code). Every benchmark cleared the evidence gate at p < 0.01. Most graduated within 250 outcomes; the slowest needed 2,000 (Snips, where the rule briefly beat the ML head — the gate held until the ML head overtook).

250 Outcomes to graduate on six of eight benchmarks.
1,000–2,000 Outcomes for the two slowest (AG News, Snips).
+10 → +81 pp Rule → ML accuracy gap, codelangs to CLINC150.

Three regimes by cardinality × rule-keyword affinity

Regime I — rule near optimum

codelangs (12 langs, 87.8% rule). Rigid syntax keywords (def, function, subroutine) leave the rule within ~10 points of ML. Graduation is evidence-justifiable but the lift is modest; the lifecycle's value is audit chain and drift symmetry.

Regime II — rule usable

ATIS (70.0% rule), TREC-6 (43.0% rule). Mid-cardinality with strong-to-moderate keyword affinity. The rule ships on day one. ML decisively wins (88.7%, 85.2%) and the gate clears at the first 250-outcome checkpoint.

Regime III — rule at floor

HWU64, Banking77, CLINC150 by high cardinality; Snips, AG News by weak keyword affinity. Rule is at-or-near chance on day one. Dendra's role here is cold-start substrate — outcome logging while a zero-shot LLM runs in front, trained head warms up underneath.

Bar chart: rule baseline (grey) vs final ML-head accuracy (blue), all eight benchmarks, sorted by rule baseline. codelangs at 87.8% rule, ATIS at 70.0%, TREC-6 at 43.0%, AG News at 25.9%, Snips at 14.3%, HWU64 at 1.8%, Banking77 at 1.3%, CLINC150 at 0.5%. Background tinting hints at the three-regime taxonomy.

The mechanism is modality-agnostic by design. The gate operates on right/wrong outcome streams that any classifier produces — text, image, audio, or structured data. Image and audio benchmarks with pretrained-embedding heads (CLIP, ViT, Wav2Vec2) ship in a companion paper.

The same evidence that lifts accuracy retires the LLM bill.

When the lifecycle reaches its final phase, every classification has earned two things at once: the right to trust the ML head's accuracy, and the right to skip the LLM tier permanently. Latency drops from hundreds of milliseconds to sub-millisecond. Per-call cost drops to essentially zero. The math, illustrative not predictive, at three scales:

Workload LLM tier (~$0.005/call) ML head (post-graduation) Annualized swing
10⁴/day · small SaaS ~$50/day ~$0/day ~$18k/year
10⁶/day · mid-scale ~$5,000/day ~$0/day ~$1.8M/year
10⁸/day · large platform ~$500,000/day ~$0/day ~$182M/year

Linear in unit cost. A frontier-tier model at $0.05/call multiplies the savings 10×; a small-tier model at $0.0005 divides them 10×. The shape that matters is constant: once the lifecycle has graduated, the ML-head tier is essentially free per call. Workloads that started on an LLM because their rule was a non-starter see the most visible economic effect.

See it on your codebase → 5 minutes, locally, no upload.

Find the classifiers in your codebase. Free. 30 seconds.

dendra analyze ./my-repo

Runs entirely locally. No upload. No signup. Walks your Python source, identifies classification decision points via six AST patterns, scores each for Dendra-fit, and outputs a JSON artifact for CI diff tracking.

$ dendra analyze ./my-repo

Scanned 12,408 Python files; found 7 classification sites.

  src/support/triage.py:42  — 5 labels, medium cardinality
    Dendra-fit: 4.5/5
    Regime: narrow-domain rule-viable (ATIS-like)
    Estimated: rule accuracy ~70%; ML would add ~15-20pp after ~500 outcomes

  src/mod/content_score.py:88  — 3 labels, binary-ish
    Dendra-fit: 4/5
    Regime: safety-critical boundary
    Recommend: Phase 4 cap (ML_WITH_FALLBACK, never ML_PRIMARY)

  ... 5 more

Report written to .dendra/analyze-2026-04-22.json

Get the analyzer →

Volume-based pricing. No per-seat fees. Free forever for the library.

Tier Price Classifications / mo
OSS library Free — install now → unlimited (self-hosted)
Free hosted $0 10,000
Solo $19/mo 100,000
Team $99/mo 1,000,000
Pro $499/mo 10,000,000
Scale $2,499/mo 100,000,000
Metered $0.01 / 1k above Scale
Enterprise Custom Custom

Every paid tier has a published price. No "contact us" gating below Enterprise. Volume-priced so adding another classifier doesn't cost you another seat. Cancel anytime.

Where Dendra has measurable impact today

  • Customer-support triage
  • Chatbot intent routing
  • LLM output moderation / PII filtering
  • Fraud and anomaly triage
  • SOC alert classification
  • Content moderation
  • Clinical coding (ICD-10, CPT)
  • RAG retrieval-strategy selection
  • Agent tool routing

Four more categories in the full list.

What this replaces

Approach What it solves Where Dendra differs
LLM-response caching
(LiteLLM, langfuse cache)
Cuts repeat-call cost on identical inputs. Dendra retires the LLM tier permanently when an in-process ML head clears the gate — not only on cache hits.
Fine-tuning the LLM Better LLM accuracy on your domain. Dendra's terminal classifier is not a tuned LLM. It's a sklearn-class in-process head: sub-ms, ~$0/call, no GPU, no inference API.
Roll your own ML migration Full control over outcome plumbing, training, shadow rollout. Dendra is one decorator at every classification site, with a uniform statistical contract and audit chain — and the rule preserved as the safety floor.
Feature flags + manual ramp Operator-driven rollout with an off switch. Dendra's gate is statistical, not vibes-based: graduation only fires when accumulated outcomes prove the upgrade is real.

When Dendra isn't the right fit. If your LLM bill is dominated by generation, summarization, or agent loops rather than classification, the savings story scales down — you'll get the safety-floor and audit-chain benefits but not the "$5,000/day retires itself" story. If you have abundant day-one labeled data, train a classifier directly. If your verdicts never arrive (no downstream signal, no human review, no inferred outcome), the gate has nothing to fire on.

Your AI coding assistant already knows how to install Dendra.

Dendra ships a SKILL.md that Claude Code, Cursor, and Copilot Workspaces can load as context. Just ask:

"Add Dendra to the triage function in src/support/triage.py."

Your assistant will wrap the function, add the import, infer the labels, and leave a minimal, reviewable diff.

dendra init src/support/triage.py:triage --author "@you:team"

dendra init is the deterministic CLI path that skips the LLM's risk of hallucinating decorator syntax. Your assistant should reach for it by default.