🧅 Onion Bench Scoreboard

Auto-refresh · last 2026-05-02 01:55 UTC · Phase 4 COMPLETE · Phase 5.3+5.4 LANDED · 5.5 Wikidata in flight

Phase 4 COMPLETE Regression Gates 4/4 PASS 0 Prod Incidents Phase 5.3 ConceptNet +60K nodes / +154K edges Phase 5.4 ATOMIC verified Phase 5.5 Wikidata BG agent · ~16hr ETA

📚 Knowledge Graph (post 5.3+5.4)

Nodes 16,067,852 +89,189 today
Edges 23,575,066 +183,905 today
Frames 512
Lexicon entries 19,757
DB size · plain 26.13 GB

🎯 rd 327 Object Reasoning

Pre-ingest baseline 23.3%
Post-ingest baseline 36.7% +13.3pp
Post-ingest + rerank 40% +16.7pp
Target ceiling 82-85%

🔁 Causal Chain (n=5 sample)

Pre-ingest pass rate 0/5 (0%)
Post-ingest pass rate 3/5 (60%) +60pp
Median latency 262.5s +3.7x
Quality vs Speed NET WIN
n=80 full bench in flight

✅ Regression Gates 4/4

Multistep 100% 0pp
Sentiment 92% 0pp
Honesty 99.33% +0.37pp post-5.3
Causal (re-framed) pending +60pp

✍️ Composer (NLG)

EN exact match 86.9% +6.4pp
TH exact match 80.3% +1.6pp
EN top-3 92.9%
TH top-3 91.2%
Pure-rule ceiling 82-85%

💾 Resource State

VPS RAM available 241 GB
VPS page cache 230 GB +80GB
VPS disk free 116 GB / 440 GB
Mac Mini load 1.02
macllm-api /health (snapshot) 200

🛰️ Live API Health

api.macllm.ai/health checking…
Latency
Last ingest snapshot 2026-05-02 01:52 UTC
Wikidata supervisor IN_PROGRESS
Checked at

Client-side fetch · CORS may block on some browsers · falls back gracefully.

📦 Phase 5 milestone status (7/7 scaffolded)
#PhaseStatusAwaits
5.1Telegram bot liverunbook readyUSER · BotFather token + 4 questions
5.2Web composer-demo backendrunbook + nginx fix doneUSER · 5 decisions remain
5.3ConceptNet bulk ingestDONE · +59,887 / +153,683commit 9d4d641
5.4ATOMIC bulk ingestDONE · re-verified +2 edgescommit 6b60bd5
5.5Wikidata pass1 re-runBG agent executing · ~16hr ETAautonomous · user pre-approved
5.6Cross-language pilotaudit + Lao dir scaffoldUSER · partner + budget
5.7L1 Thai linguist reviewframework + 4880 PENDING tagsUSER · linguist budget
📜 Today's commits (2026-05-01 · 26+)
e9b8398 · feat(infra vps-backup-cleanup): freed 63.6GB · 88%→73%
3da3de0 · docs(checklist sync v5): Phase 4+5 closeout · 26 commits
2aeb0f3 · fix(nginx): CF-Connecting-IP real-IP · zero downtime
99c05ec · docs(phase5.7): L1 Thai · 4880 PENDING tags inventory
de74989 · docs(phase5.2): prereq verification · 3 checks
e523dc8 · docs(phase5): user decisions consolidation · 21 questions
acb6e7f · feat(phase5.6): Lao pilot directory + template
6c80605 · docs(phase5.5): Wikidata supervisor scaffold
5114ffc · docs(phase5.2): web composer runbook · 703 lines
6a15daa · docs(phase5.6): cross-language audit · 211 lines
c38b595 · docs(phase4-final): causal +60pp accuracy reframed
98c4b8f · docs(phase5.1): Telegram runbook · 632 lines
803b1cd · docs(phase5): ingest readiness audit · 5.3-5.5
8efd8b4 · docs(phase4.5): polisher DEFER · 187 lines
ed1c792 · feat(phase4-causal-perf-rerun): n=5 · 3/5 pass
ba968fd · feat(phase4-causal-perf-rerun): n=4 · partial
2d968df · docs(phase4-final): COMPLETE · 320 lines
8168a74 · docs(rd 191): encrypt-runbook preserved
d6debc6 · docs(research): Apr 29 synthesis × 4
8bdd4bd · feat(bench-rd324): KG node-degree
2314325 · test: rd 324/357 + Phase 1 commerce
0fb52c8 · feat(phase4-regression): 3/4 PASS
d35b7e3 · feat(phase4.2): rd 327 +13.3/+3.3pp
3d216d7 · feat(phase4-kg-verify): atomic ingest
c58f5ee · docs(phase4.1): COMPLETE · 5 sec actual
c79075a · docs(checklist priority lock)