Onion vs Claude · 7 Defensibility Dimensions

Phase 6.4 partial · honest scoreboard · last 2026-05-02T05:43:07Z · auto · per CLAUDE.md "เหนือ Claude 10+ มิติ" design rule

How to read this dashboard. These 7 dimensions are Onion's claimed defensibility moat per CLAUDE.md design rules · NOT a claim of overall accuracy parity. Onion metrics are measured from internal artifacts (bench snapshots · crystal.db sampling · code-level audits) via scripts/eval/moat_metrics.py. Claude metrics are estimated from public vendor docs and third-party studies (cited where possible · marked "est." otherwise). Numbers refresh as bench data lands. Honest weaknesses surfaced below — do not skip that section.

Real-Metrics Status (auto-refreshed from data/moat_snapshot_latest.json):

Honesty · status measured · source scripts/eval/snapshots/honesty_post_phase5_*.json
Trace coverage / Audit trail · status measured (static-import audit) · source pipeline/brain/v4/engine.py + traceability.py
Lifelong learning · status measured (24h KG-growth proxy) · source data/brainv4/crystal.db · Hebbian/STDP delta deferred to Phase 6.5
No forgetting · status measured (random sample >30d re-queried) · source data/brainv4/crystal.db
Private + $0/query · status measured · source docs/NO_EXTERNAL_API_AUDIT_2026_05_02.md
Determinism · status scaffold (architectural · CI gate is Phase 6.7 deliverable)

7 Defensibility Dimensions · Per-Card Detail

1 · 🛡️ Honesty / No-Hallucinate

Onion99.67%

Claude (estimated)15-30% halluc.est.

Sourceregression bench post-5.3

Onion ahead

Onion returns "no answer found" when KG lacks the fact instead of fabricating. Claude hallucination rate documented in third-party studies (Vectara HHEM · TruthfulQA) at 15-30% on factual prompts.

2 · 🔍 Trace / Audit Trail

Onion (structural)100.0%

Audit fields present100.0%

Claude (estimated)opaque

Implementationtraceability.py

Onion ahead (structural)

Every Onion answer carries source-node IDs, evidence path, and confidence per traceability.py. Numbers above = static-import audit (engine wraps outputs in TraceabilityLayer; TraceMetadata exposes source/module/evidence/confidence/trace_id). Per-answer empirical trace-rate audit is Phase 6.4 follow-up. Claude's chain-of-thought is post-hoc reasoning · not auditable retrieval — model weights are not source-attributable.

3 · 💸 Cost per Query

Onionlocal · $0/query

Claude API (vendor)$3-15 / 1M tokvendor

Onion infra costamortised · no per-query

Onion ahead

Claude Sonnet/Opus pricing per Anthropic public docs. Onion runs on owned hardware (Mac Mini M1 + MacBook M2 + VPS) · no per-query token charges · zero rate-limit risk · zero vendor lock-in.

4 · 🔒 Privacy

Onion100% local · audited

Claude (vendor)data leaves host

Audit25 sites · 0 hits

Onion ahead

Per CLAUDE.md NO_EXTERNAL_API rule + audit (NO_EXTERNAL_API_AUDIT_2026_05_02.md): no Claude/OpenAI/Gemini call paths in production · pre-commit guard active. All inference local · BrainV3 self-contained.

5 · 🌱 Lifelong Learning

OnionKG grows forever

Claude (vendor)static weights

Recent ingest+19 nodes today

Onion ahead (structural)

Onion ingests new sources continuously (ConceptNet · ATOMIC · Wikidata in flight · ThinkPost autopilot). Claude knowledge frozen at training cutoff · updates require full retraining cycle. Hebbian + STDP edge-weight updates planned in Phase 6.5 (research-grade).

6 · 🎯 Determinism

Onionsame input → same output

Claude (vendor)probabilistic (sampled)

CI gatepending Phase 6.7

Onion ahead · gate pending

Onion uses pure-symbolic deterministic walks + rule firings. Claude is sampled at temperature > 0 by default · same prompt yields different completions. Phase 6.7 will add a CI determinism gate (snapshot-replay byte-identical) — currently architecturally deterministic but unaudited.

7 · 🧠 No Forgetting

Onion (sample >30d)100.0%

Claude (vendor)finite context window

PolicyCLAUDE.md no-forgetting rule

Onion ahead (by policy)

Onion storage lossless · prune of node/edge forbidden · decay only at activation/retrieval level. The percentage above = random-sample retrieval rate on 100 nodes older than 30 days re-queried by primary key (proves nodes aren't silently dropped; semantic-recall preservation belongs to Phase 6.5). Claude's training data discarded post-training and active context window bounded (200K-1M tokens).

Composite Scoreboard

Honest Weaknesses · Where Claude Wins Today

The 7 moat dimensions above are not the whole story. On raw bench accuracy, Claude leads in several measurable areas. Phase 6 is partly designed to close these gaps · we surface them here without spin.

Citation block

#	Dimension	Onion	Claude	Verdict	Source / Status
1	Honesty / no-hallucinate	99.67%	15-30% halluc.est.	Onion ahead	regression bench (measured)
2	Trace / audit trail	100% sources	opaque	Onion ahead	`traceability.py` (structural)
3	Cost per query	$0	$3-15/Mtok	Onion ahead	Anthropic vendor docs
4	Privacy	100% local	cloud API	Onion ahead	NO_EXTERNAL_API audit
5	Lifelong learning	continuous	frozen	Onion ahead	structural · 6.5 widening
6	Determinism	deterministic	sampled	Onion ahead · gate pending	6.7 CI gate planned
7	No forgetting	lossless	windowed	Onion ahead	CLAUDE.md policy · harness 6.5

Internal sources (measured)

bench-scoreboard.html — live Phase 4 / Phase 5 bench numbers (rd 327 · regression gates · composer · KG state)
docs/MASTER_CHECKLIST.md — canonical work tracker · priority lock · change log
docs/PHASE_6_VISION_2026_05_02.md §7 — 7-dimension defensibility narrative + Onion-vs-Claude scoreboard authoritative source
docs/PHASE_4_FINAL_REPORT_2026_05_01.md — rd 327 measured 40% post-ingest+rerank
docs/NO_EXTERNAL_API_AUDIT_2026_05_02.md — privacy claim audit (25 sites · 0 production hits)
pipeline/brain/traceability.py — trace implementation (source-node + evidence + confidence per output)
CLAUDE.md — "0% pre-trained LLM architecture rules v2" · no-forgetting policy · "เหนือ Claude 10+ มิติ" target

External sources (estimated · cited where verifiable)

Anthropic vendor docs · API pricing $3-15 / 1M tokens (Sonnet / Opus tiers · accessed 2026-05)
Vectara HHEM · TruthfulQA · third-party hallucination benchmarks · 15-30% range across domains (estimate · not direct vendor figure)
BIG-Bench · ARC · reasoning subsets · Claude ~85% estimate on object-reasoning probes (third-party · approximate · not direct vendor figure)
TH composer Claude estimate · informal probes · no formal bench published — flagged as estimate

Honesty caveat. Claude metrics are estimates from public knowledge / vendor docs / third-party studies · Anthropic does not publish per-dimension equivalents of these benches. We mark "est." where we cannot link to a primary source. Numbers refresh as Phase 6.4 instrumentation lands.

#	Output	Status
a	7-dimension dashboard (this page)	partial · this page
b	Onion-vs-Claude scoreboard	partial · this page
c	Determinism CI gate	Phase 6.7 · pending
d	Lifelong-learning evidence panel	Phase 6.5 · pending
e	Adversarial honesty extension n≥500	Phase 6.4 followup
f	Trace-coverage % CI gate	Phase 6.4 followup
g	Audit-trail UI · per-query reasoning chain	Phase 6.4 followup

🧅 Onion vs Claude · 7 Defensibility Dimensions

7 Defensibility Dimensions · Per-Card Detail

1 · 🛡️ Honesty / No-Hallucinate

2 · 🔍 Trace / Audit Trail

3 · 💸 Cost per Query

4 · 🔒 Privacy

5 · 🌱 Lifelong Learning

6 · 🎯 Determinism

7 · 🧠 No Forgetting

Composite Scoreboard

Honest Weaknesses · Where Claude Wins Today

⚠️ rd 327 object-reasoning depth · 40% vs Claude ~85%

⚠️ Cross-domain transfer · untested

⚠️ TH composer ceiling · 80.3% exact vs Claude ~88% estimated

⚠️ Common-sense breadth · KG growing but partial

Citation block