🧅 Onion vs Claude · 7 Defensibility Dimensions

Phase 6.4 partial · honest scoreboard · last 2026-05-02T05:43:07Z · auto · per CLAUDE.md "เหนือ Claude 10+ มิติ" design rule

Honesty 99.33% measured Trace 100% (traceability.py) Cost $0/query Local · NO_EXTERNAL_API audited KG 9,876,561 nodes rd 327 reasoning 40% (Claude lead)
How to read this dashboard. These 7 dimensions are Onion's claimed defensibility moat per CLAUDE.md design rules · NOT a claim of overall accuracy parity. Onion metrics are measured from internal artifacts (bench snapshots · crystal.db sampling · code-level audits) via scripts/eval/moat_metrics.py. Claude metrics are estimated from public vendor docs and third-party studies (cited where possible · marked "est." otherwise). Numbers refresh as bench data lands. Honest weaknesses surfaced below — do not skip that section.
Real-Metrics Status (auto-refreshed from data/moat_snapshot_latest.json):

7 Defensibility Dimensions · Per-Card Detail

1 · 🛡️ Honesty / No-Hallucinate

Onion99.67%
Claude (estimated)15-30% halluc.est.
Sourceregression bench post-5.3
Onion ahead
Onion returns "no answer found" when KG lacks the fact instead of fabricating. Claude hallucination rate documented in third-party studies (Vectara HHEM · TruthfulQA) at 15-30% on factual prompts.

2 · 🔍 Trace / Audit Trail

Onion (structural)100.0%
Audit fields present100.0%
Claude (estimated)opaque
Implementationtraceability.py
Onion ahead (structural)
Every Onion answer carries source-node IDs, evidence path, and confidence per traceability.py. Numbers above = static-import audit (engine wraps outputs in TraceabilityLayer; TraceMetadata exposes source/module/evidence/confidence/trace_id). Per-answer empirical trace-rate audit is Phase 6.4 follow-up. Claude's chain-of-thought is post-hoc reasoning · not auditable retrieval — model weights are not source-attributable.

3 · 💸 Cost per Query

Onionlocal · $0/query
Claude API (vendor)$3-15 / 1M tokvendor
Onion infra costamortised · no per-query
Onion ahead
Claude Sonnet/Opus pricing per Anthropic public docs. Onion runs on owned hardware (Mac Mini M1 + MacBook M2 + VPS) · no per-query token charges · zero rate-limit risk · zero vendor lock-in.

4 · 🔒 Privacy

Onion100% local · audited
Claude (vendor)data leaves host
Audit25 sites · 0 hits
Onion ahead
Per CLAUDE.md NO_EXTERNAL_API rule + audit (NO_EXTERNAL_API_AUDIT_2026_05_02.md): no Claude/OpenAI/Gemini call paths in production · pre-commit guard active. All inference local · BrainV3 self-contained.

5 · 🌱 Lifelong Learning

OnionKG grows forever
Claude (vendor)static weights
Recent ingest+19 nodes today
Onion ahead (structural)
Onion ingests new sources continuously (ConceptNet · ATOMIC · Wikidata in flight · ThinkPost autopilot). Claude knowledge frozen at training cutoff · updates require full retraining cycle. Hebbian + STDP edge-weight updates planned in Phase 6.5 (research-grade).

6 · 🎯 Determinism

Onionsame input → same output
Claude (vendor)probabilistic (sampled)
CI gatepending Phase 6.7
Onion ahead · gate pending
Onion uses pure-symbolic deterministic walks + rule firings. Claude is sampled at temperature > 0 by default · same prompt yields different completions. Phase 6.7 will add a CI determinism gate (snapshot-replay byte-identical) — currently architecturally deterministic but unaudited.

7 · 🧠 No Forgetting

Onion (sample >30d)100.0%
Claude (vendor)finite context window
PolicyCLAUDE.md no-forgetting rule
Onion ahead (by policy)
Onion storage lossless · prune of node/edge forbidden · decay only at activation/retrieval level. The percentage above = random-sample retrieval rate on 100 nodes older than 30 days re-queried by primary key (proves nodes aren't silently dropped; semantic-recall preservation belongs to Phase 6.5). Claude's training data discarded post-training and active context window bounded (200K-1M tokens).

Composite Scoreboard

#DimensionOnionClaudeVerdictSource / Status
1Honesty / no-hallucinate99.67%15-30% halluc.est.Onion aheadregression bench (measured)
2Trace / audit trail100% sourcesopaqueOnion aheadtraceability.py (structural)
3Cost per query$0$3-15/MtokOnion aheadAnthropic vendor docs
4Privacy100% localcloud APIOnion aheadNO_EXTERNAL_API audit
5Lifelong learningcontinuousfrozenOnion aheadstructural · 6.5 widening
6DeterminismdeterministicsampledOnion ahead · gate pending6.7 CI gate planned
7No forgettinglosslesswindowedOnion aheadCLAUDE.md policy · harness 6.5

Honest Weaknesses · Where Claude Wins Today

The 7 moat dimensions above are not the whole story. On raw bench accuracy, Claude leads in several measurable areas. Phase 6 is partly designed to close these gaps · we surface them here without spin.

⚠️ rd 327 object-reasoning depth · 40% vs Claude ~85%

Phase 4.2 measured 40% post-ingest+rerank (from 23.3% pre-ingest baseline). Claude estimated ~85% on similar reasoning probes (third-party benches such as ARC · BIG-Bench reasoning subsets · estimated). Phase 6.1 reasoning-depth milestone targets 55-65% as honest mid-term goal — full parity not pre-committed.

⚠️ Cross-domain transfer · untested

Onion has no published cross-domain transfer bench. Claude has documented zero-shot transfer (multi-discipline benchmarks · reasoning-on-novel-tasks). Onion's KG-based architecture should generalize within KG coverage · structurally unverified · belongs to Phase 6+ research.

⚠️ TH composer ceiling · 80.3% exact vs Claude ~88% estimated

Phase 4.2 measured TH composer 80.3% / EN 86.9% exact match on canonical roundtrip. Claude TH fluency estimated ~88% on similar prompts (no formal Onion-vs-Claude TH bench yet · estimate from informal probes). Phase 4.5 polisher decision DEFERRED · re-evaluated post Phase 5.3-5.5 ingest.

⚠️ Common-sense breadth · KG growing but partial

16M nodes today · CLAUDE.md target 60-80M nodes in 2-3 months. Claude has internet-scale training corpus advantage on long-tail facts and open-domain Q&A breadth. Onion bridges via bulk ingest (ConceptNet ✅ · ATOMIC ✅ · Wikidata in flight) · honest gap remains until ingest plateau.

Citation block

Internal sources (measured) External sources (estimated · cited where verifiable)

Honesty caveat. Claude metrics are estimates from public knowledge / vendor docs / third-party studies · Anthropic does not publish per-dimension equivalents of these benches. We mark "est." where we cannot link to a primary source. Numbers refresh as Phase 6.4 instrumentation lands.

📦 Phase 6.4 milestone status

Phase 6.4 = "Defensibility-moat instrumentation + public scoreboard" per PHASE_6_VISION_2026_05_02.md. Risk L · 3-5w. Outputs: (a) 7-dimension dashboard live · (b) Onion-vs-Claude scoreboard with honest framing · (c) determinism CI gate (Phase 6.7) · (d) lifelong-learning evidence panel (Phase 6.5).

#OutputStatus
a7-dimension dashboard (this page)partial · this page
bOnion-vs-Claude scoreboardpartial · this page
cDeterminism CI gatePhase 6.7 · pending
dLifelong-learning evidence panelPhase 6.5 · pending
eAdversarial honesty extension n≥500Phase 6.4 followup
fTrace-coverage % CI gatePhase 6.4 followup
gAudit-trail UI · per-query reasoning chainPhase 6.4 followup