signed eval · live

The live-facts benchmark every frozen model fails.

Ask any model these questions without tools and it fails — not because it's weak, but because the answers moved after its training cutoff. One keyless call generates a fresh, signed eval set measured from authoritative sources at that moment: the cleanest demonstration of why agents need live, verifiable grounding.

The call

Regenerate per run. Each item carries its grading rule, source and measured_at; the set is Ed25519-signed so published scores are evidence, not claims.

# a fresh signed eval set — answers measured at call time (keyless)
curl "https://dynamicfeed.ai/v1/benchmark"

Sample response

{ "schema": "benchmark/v1", "name": "Dynamic Feed Live-Facts Benchmark",
  "items": [ { "id": "fed_funds_rate", "category": "rates",
      "question": "What is the current US Effective Federal Funds Rate, in percent?",
      "expected_answer": "3.62", "grading": "numeric", "tolerance": 0.15,
      "source": "Federal Reserve Bank of New York" } ],
  "signature": { "alg": "Ed25519", "key_id": "df-ed25519-…" } }

Why live data

Every eval of a frozen model goes stale the day it's published. This one can't: the answers are measured live at call time, every item names its authoritative source, and the whole set is cryptographically signed. Score a frozen model (near 0) against your grounded agent (near 1) and the case for live data makes itself.

Use it for

Demonstrating staleness/hallucination to stakeholders
Regression-testing agent grounding in CI
Comparing tool-using agents against frozen baselines
Content and research on model drift

Get a free API key ↗ See it live ↗ API docs ↗

FAQ

Why does every frozen model fail?

By construction — the questions ask for values that changed after any fixed training cutoff: today's date, current rates, the latest CVE added to CISA KEV, the newest Python release.

How is it graded?

Each item carries a grading rule: exact (versions, CVE ids, dates), numeric with tolerance (rates, counts), or contains_place (the latest quake).

Why is it signed?

So a published score references a verifiable set — anyone can check the Ed25519 signature against /.well-known/keys and see exactly what was asked and what was true.

Keyless?

Yes — GET /v1/benchmark, no signup.

Related live feeds

Stop AI hallucination What changed since training Real-time data for agents Verifiable AI data All use cases →