# answer-eval/1 and answer-conf/1

The published, deterministic evaluation semantics behind `POST https://dynamicfeed.ai/v1/answer`
(schema `answer/v1`). Same evidence in, same verdicts out: everything on this page can be
recomputed from a receipt alone, offline. The signature proves integrity, not truth; `verified`
exists only inside each fact's verification block (2+ independent sources agreeing while fresh).
Everything is advisory evidence, tamper evident, never a certification.

Engine version: 1.0.0. Rule ids pinned in every receipt at `reproduce.evaluator`.

## 1. The check unit

```json
{"id": "py", "tool": "software_version", "args": {"product": "python"},
 "expect": {"path": "results.0.latest_version", "op": "semver_gte", "value": "3.13"},
 "required": true}
```

* `expect` is one expectation object or an all-of array (max 8). Omitting it, or setting
  `observe: true`, records the premise with verdict `value` and no adjudication.
* `path` addresses the RAW tool result with keys and non-negative integer indices only
  (`results.0.magnitude`), max depth 8, no wildcards. When omitted, the operator evaluates the
  fact's primary value as extracted by fact/v1.
* If a required tool argument is missing the check is `not_evaluable` naming the argument.
  Silent defaults are banned: a defaulted answer would be a guess wearing a signature.

## 2. Operators (closed set, pure functions)

| op | semantics |
|---|---|
| `eq`, `ne` | string-first, never coercing: numeric compare only when BOTH sides are JSON numbers; otherwise strict string compare after trim. `"3.10" != "3.1"`. Version-like strings note that `semver_eq` exists. |
| `lt`, `lte`, `gt`, `gte` | numeric only; a string operand must FULLY match `^-?\d+(\.\d+)?$`; anything else is `not_evaluable` |
| `between` | `value: [lo, hi]` inclusive, numeric rules as above |
| `abs_within` | `abs(observed - value) <= tol`, via `{"value": x, "tol": t}` |
| `pct_within` | `abs(observed - value)/abs(value)*100 <= tol`; expected value 0 is `not_evaluable` (use `abs_within`) |
| `in` | membership in `value` (a list), per-element `eq` rules |
| `contains` | literal substring on `str(observed)`, or list membership; no patterns |
| `starts_with`, `ends_with` | literal prefix / suffix |
| `semver_eq`, `semver_gte`, `semver_lt` | dotted numeric tuple compare |
| `semver_prefix` | expected tuple is a component-wise prefix of observed: `3.13` matches `3.13.1`, and `3.1` does NOT match `3.13` |
| `exists`, `not_exists` | path present / absent |
| `fresh_within_s` | evidence timestamp within N seconds of evaluation time; staleness fails loudly |

There is deliberately NO regex operator: Python's regex engine has no timeout and a crafted
pattern from an anonymous caller could stall the service. Literal operators cover the honest
cases.

## 3. Per-check verdicts

`supported` (every expectation held) · `contradicted` (any expectation failed) · `value`
(observe, no adjudication) · `not_evaluable` (evidence arrived but the path or type does not
fit; the explain lists the keys actually present) · `evidence_unavailable` (feed failed or
missed the hard deadline) · `outside_evidence_coverage` (unknown tool or unroutable question) ·
`not_checked` (over the 20-check cap; never silently dropped).

## 4. Composite verdict and confidence (answer-conf/1)

```
check_confidence  = min(fact.verification.confidence) over the facts the check cites
                    (fact-conf/1: the fact/v1 composite, every signal exposed in the receipt)

composite verdict, over REQUIRED checks only:
  contradicted            if any is contradicted
  insufficient_evidence   else if any is evidence_unavailable, outside_evidence_coverage,
                          not_evaluable, or not_checked
  supported               else if at least one expectation was adjudicated and all held
  evidenced               else (observe-only batch)

composite confidence:
  0.0                                            if insufficient_evidence
  max(check_confidence over contradicted checks) if contradicted
  min(check_confidence over required checks)     otherwise, rounded to 2 dp
```

The weakest premise caps a conjunction; the best evidenced counterexample sets a contradiction;
no independence assumption is smuggled in by multiplying. Optional checks (`required: false`)
report verdicts and confidences but never move the composite. `degraded: true` whenever any
check lost its evidence; a batch whose REQUIRED evidence is incomplete can never be `supported`,
but a contradiction already in hand stands, because it is real evidence.

## 5. Verifying a receipt offline

1. Remove the `anchor` field. Canonicalize (JSON, keys sorted recursively, compact separators,
   UTF-8). SHA-256 it. If you hold an RFC 3161 token for this receipt, compare this digest to
   the token's message imprint and validate the token against the TSA chain.
2. Remove the `signature` field. Recompute `receipt_id` as `"ans_" + sha256(canonical(body minus
   receipt_id, signature, anchor))[:16]` and compare.
3. Verify the Ed25519 signature over the canonical bytes against the key named in
   `signature.key_id`, published at `https://dynamicfeed.ai/.well-known/keys`.
4. Evidence closure: every digest a check cites must resolve in `evidence_index`, and every
   evidence object must hash (canonical sha256 of the object minus `attestation` and `raw`)
   to its digest. The answer cannot cite anything outside its own bundle.
5. Replay: apply section 2 to the evidence and section 4 to the verdicts. The numbers must
   reproduce exactly.

Reference verifiers for the signature layer: `pip install dynamicfeed-verify`,
`npm i @dynamicfeed/verify`, `cargo install dynamicfeed-verify`.

## 6. Question mode (route/1)

A free-text question is compiled to checks by an ordered, declarative route table, first match
wins, or refused with a SIGNED `outside_evidence_coverage` receipt that includes the coverage
manifest. The compiled checks and the route table's sha256 are echoed in the receipt, so old
receipts stay pinned to the routing that produced them. Question mode does no paraphrase
understanding and no multi-hop reasoning: it is provably sugar over checks mode. A declarative
text that embeds an assertable value is adjudicated; an interrogative asserts nothing and
compiles to an observe check.

The current coverage manifest: `GET https://dynamicfeed.ai/v1/answer`.