Phase 6.4 partial · honest scoreboard · last 2026-05-02T05:43:07Z · auto · per CLAUDE.md "เหนือ Claude 10+ มิติ" design rule
CLAUDE.md design rules · NOT a claim of overall accuracy parity. Onion metrics are measured from internal artifacts (bench snapshots · crystal.db sampling · code-level audits) via scripts/eval/moat_metrics.py. Claude metrics are estimated from public vendor docs and third-party studies (cited where possible · marked "est." otherwise). Numbers refresh as bench data lands. Honest weaknesses surfaced below — do not skip that section.
data/moat_snapshot_latest.json):
measured · source scripts/eval/snapshots/honesty_post_phase5_*.jsonmeasured (static-import audit) · source pipeline/brain/v4/engine.py + traceability.pymeasured (24h KG-growth proxy) · source data/brainv4/crystal.db · Hebbian/STDP delta deferred to Phase 6.5measured (random sample >30d re-queried) · source data/brainv4/crystal.dbmeasured · source docs/NO_EXTERNAL_API_AUDIT_2026_05_02.mdscaffold (architectural · CI gate is Phase 6.7 deliverable)traceability.pytraceability.py. Numbers above = static-import audit (engine wraps outputs in TraceabilityLayer; TraceMetadata exposes source/module/evidence/confidence/trace_id). Per-answer empirical trace-rate audit is Phase 6.4 follow-up. Claude's chain-of-thought is post-hoc reasoning · not auditable retrieval — model weights are not source-attributable.CLAUDE.md NO_EXTERNAL_API rule + audit (NO_EXTERNAL_API_AUDIT_2026_05_02.md): no Claude/OpenAI/Gemini call paths in production · pre-commit guard active. All inference local · BrainV3 self-contained.| # | Dimension | Onion | Claude | Verdict | Source / Status |
|---|---|---|---|---|---|
| 1 | Honesty / no-hallucinate | 99.67% | 15-30% halluc.est. | Onion ahead | regression bench (measured) |
| 2 | Trace / audit trail | 100% sources | opaque | Onion ahead | traceability.py (structural) |
| 3 | Cost per query | $0 | $3-15/Mtok | Onion ahead | Anthropic vendor docs |
| 4 | Privacy | 100% local | cloud API | Onion ahead | NO_EXTERNAL_API audit |
| 5 | Lifelong learning | continuous | frozen | Onion ahead | structural · 6.5 widening |
| 6 | Determinism | deterministic | sampled | Onion ahead · gate pending | 6.7 CI gate planned |
| 7 | No forgetting | lossless | windowed | Onion ahead | CLAUDE.md policy · harness 6.5 |
The 7 moat dimensions above are not the whole story. On raw bench accuracy, Claude leads in several measurable areas. Phase 6 is partly designed to close these gaps · we surface them here without spin.
Phase 4.2 measured 40% post-ingest+rerank (from 23.3% pre-ingest baseline). Claude estimated ~85% on similar reasoning probes (third-party benches such as ARC · BIG-Bench reasoning subsets · estimated). Phase 6.1 reasoning-depth milestone targets 55-65% as honest mid-term goal — full parity not pre-committed.
Onion has no published cross-domain transfer bench. Claude has documented zero-shot transfer (multi-discipline benchmarks · reasoning-on-novel-tasks). Onion's KG-based architecture should generalize within KG coverage · structurally unverified · belongs to Phase 6+ research.
Phase 4.2 measured TH composer 80.3% / EN 86.9% exact match on canonical roundtrip. Claude TH fluency estimated ~88% on similar prompts (no formal Onion-vs-Claude TH bench yet · estimate from informal probes). Phase 4.5 polisher decision DEFERRED · re-evaluated post Phase 5.3-5.5 ingest.
16M nodes today · CLAUDE.md target 60-80M nodes in 2-3 months. Claude has internet-scale training corpus advantage on long-tail facts and open-domain Q&A breadth. Onion bridges via bulk ingest (ConceptNet ✅ · ATOMIC ✅ · Wikidata in flight) · honest gap remains until ingest plateau.
docs/MASTER_CHECKLIST.md — canonical work tracker · priority lock · change logdocs/PHASE_6_VISION_2026_05_02.md §7 — 7-dimension defensibility narrative + Onion-vs-Claude scoreboard authoritative sourcedocs/PHASE_4_FINAL_REPORT_2026_05_01.md — rd 327 measured 40% post-ingest+rerankdocs/NO_EXTERNAL_API_AUDIT_2026_05_02.md — privacy claim audit (25 sites · 0 production hits)pipeline/brain/traceability.py — trace implementation (source-node + evidence + confidence per output)CLAUDE.md — "0% pre-trained LLM architecture rules v2" · no-forgetting policy · "เหนือ Claude 10+ มิติ" targetHonesty caveat. Claude metrics are estimates from public knowledge / vendor docs / third-party studies · Anthropic does not publish per-dimension equivalents of these benches. We mark "est." where we cannot link to a primary source. Numbers refresh as Phase 6.4 instrumentation lands.
Phase 6.4 = "Defensibility-moat instrumentation + public scoreboard" per PHASE_6_VISION_2026_05_02.md. Risk L · 3-5w. Outputs: (a) 7-dimension dashboard live · (b) Onion-vs-Claude scoreboard with honest framing · (c) determinism CI gate (Phase 6.7) · (d) lifelong-learning evidence panel (Phase 6.5).
| # | Output | Status |
|---|---|---|
| a | 7-dimension dashboard (this page) | partial · this page |
| b | Onion-vs-Claude scoreboard | partial · this page |
| c | Determinism CI gate | Phase 6.7 · pending |
| d | Lifelong-learning evidence panel | Phase 6.5 · pending |
| e | Adversarial honesty extension n≥500 | Phase 6.4 followup |
| f | Trace-coverage % CI gate | Phase 6.4 followup |
| g | Audit-trail UI · per-query reasoning chain | Phase 6.4 followup |