Evaluation evidence

Evals

Immutable UI generation eval runs compare raw generated interfaces with JudgmentKit-guided handoff outputs. Use these reports as historical committed evidence, not as broad benchmark claims or as the current hosted MCP version.

Latest committed eval run: 2026-07-03 / mcp-0.6.5 / run-002
Current hosted MCP release: 0.6.5
Historical MCP release: 0.6.5
Claim level: repeated_pair_signal
Result: 2/2 passed
Guided wins: 2

Qualitative paired-artifact evidence from live provider-generated artifacts only; not a statistically powered benchmark.

UI eval report Site rebuild log Latest committed HTML report Latest committed JSON report Catalog JSON

All runs

Date	Historical MCP release	Run	Claim level	Result	Reports
2026-07-03	0.6.5	run-002	repeated_pair_signal	2/2 passed	HTML · JSON
2026-07-03	0.6.5	run-001	repeated_pair_signal	2/2 passed	HTML · JSON
2026-06-18	0.4.0	run-001	repeated_pair_signal	2/2 passed	HTML · JSON
2026-06-18	0.3.0	run-001	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-15	0.1.0	run-001	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-13	0.1.0	run-001	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-12	0.1.0	run-005	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-12	0.1.0	run-004	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-12	0.1.0	run-003	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-12	0.1.0	run-002	repeated_pair_signal	2/2 passed	HTML · JSON
2026-05-12	0.1.0	run-001	repeated_pair_signal	2/2 passed	HTML · JSON