The live-facts benchmark every frozen model fails.
Ask any model these questions without tools and it fails — not because it's weak, but because the answers moved after its training cutoff. One keyless call generates a fresh, signed eval set measured from authoritative sources at that moment: the cleanest demonstration of why agents need live, verifiable grounding.
The call
Regenerate per run. Each item carries its grading rule, source and measured_at; the set is Ed25519-signed so published scores are evidence, not claims.
# a fresh signed eval set — answers measured at call time (keyless)
curl "https://dynamicfeed.ai/v1/benchmark"
Sample response
{ "schema": "benchmark/v1", "name": "Dynamic Feed Live-Facts Benchmark",
"items": [ { "id": "fed_funds_rate", "category": "rates",
"question": "What is the current US Effective Federal Funds Rate, in percent?",
"expected_answer": "3.62", "grading": "numeric", "tolerance": 0.15,
"source": "Federal Reserve Bank of New York" } ],
"signature": { "alg": "Ed25519", "key_id": "df-ed25519-…" } }
Why live data
Every eval of a frozen model goes stale the day it's published. This one can't: the answers are measured live at call time, every item names its authoritative source, and the whole set is cryptographically signed. Score a frozen model (near 0) against your grounded agent (near 1) and the case for live data makes itself.
Use it for
- Demonstrating staleness/hallucination to stakeholders
- Regression-testing agent grounding in CI
- Comparing tool-using agents against frozen baselines
- Content and research on model drift
FAQ
Why does every frozen model fail?
By construction — the questions ask for values that changed after any fixed training cutoff: today's date, current rates, the latest CVE added to CISA KEV, the newest Python release.
How is it graded?
Each item carries a grading rule: exact (versions, CVE ids, dates), numeric with tolerance (rates, counts), or contains_place (the latest quake).
Why is it signed?
So a published score references a verifiable set — anyone can check the Ed25519 signature against /.well-known/keys and see exactly what was asked and what was true.
Keyless?
Yes — GET /v1/benchmark, no signup.