ANCHORED GUARD VERDICTS

Anchored Guard Verdicts

A signed, anchored receipt that a piece of untrusted inbound content was screened for prompt-injection and jailbreak patterns by a named, versioned ruleset, with a reproducible verdict. It commits to the content and the ruleset by hash, so the same ruleset re-run on the same input yields the identical decision and matches. It records screening provenance, never a guarantee that the content is safe.

AGR · Shipped1 wire typeApache-2.0 · CC0 spec
← All familiesVerify a receipt →

WIRE TYPES

ar.guard.v1 (guard.input)

Single canonical wire type for input screening. Commits to the screened content (SHA-256), names the ruleset and its hash plus the deterministic detector and version, and records the three-state decision (pass / flagged / blocked) and each matched rule's id, severity, and category (one of six — instruction_override, system_prompt_leak, delimiter_injection, encoding_obfuscation, …), at a UTC time.

WHAT IT PROVES

  • A specific piece of inbound content, pinned by hash, was screened against a named ruleset at a specific UTC time (content_commit + Merkle inclusion proof).
  • The recorded decision and matched rules are exactly as the detector produced and have not been altered (Ed25519 signature check).
  • The receipt was signed by a registered issuer key (Ed25519 signature check).
  • The verdict is reproducible — the ruleset hash commits to the rule corpus and the decision policy, so re-running the deterministic detector on the same content reproduces the decision and matches (ruleset_hash commitment).

WHAT IT DOESN'T PROVE

  • That the screened content is actually safe — a pass verdict means no rule fired, not that no attack is present.
  • That a flagged or blocked verdict was a true positive — the detector matches patterns, it does not adjudicate intent.
  • That the ruleset covers the relevant threats — coverage is only as good as the named, inspectable ruleset, which the receipt does not vouch for.
  • That the downstream model or agent actually honoured the verdict — the receipt records the screening, not the enforcement.

COMPOSES WITH

AGR receipts reference other family members via body-level composition pointers — verifier-coordinated, not signature-mandated.

AActRAnchored Action Receipts

A guard verdict screens the input that drives an agent run; the resulting AActR records the action that ran on that input — screening provenance and action provenance for the same request.

AQRAnchored Quality Receipts

Deliberately distinct surfaces of the same run: AGR screens the input before the agent acts, AQR scores the output after — input-safety provenance versus output-quality provenance.

Verify any
AGR receipt.

verify.dekimu.com ↗

Paste any claim ID to verify a receipt, check its anchor, and inspect the issuer signature.

REFERENCES

AI Act Art. 15 — Accuracy, robustness and cybersecurity (EUR-Lex)
AI Act Art. 9 — Risk management system (EUR-Lex)

Anchored Guard Verdicts are cryptographic provenance and privacy-lifecycle protocols. verify.dekimu.com is a reference implementation, not a qualified trust service under Regulation (EU) No 910/2014 (eIDAS) or successor.