XAIP Protocol

Provider-neutral signed execution evidence for AI agent tool calls — open protocol, zero-auth API.

Start here → Trust Evidence Before Delegation · one-screen live demo | Browser playground

Receipts are the primary artifact: dual-signed Ed25519 records of AI agent tool execution (hashes only — no raw content leaves your machine), portable across MCP, LangChain.js, OpenAI-compatible tool-call loops, and other runtimes. Trust scores below are one derived view over those receipts — not a claim of absolute safety or correctness. The current public dataset is MCP-heavy because MCP was the first integration target.

Every score below is computed from real receipts and is queryable without auth:

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

trust — observed success probability for this server, based on signed execution receipts. Higher means the server has succeeded more often, across more independent callers, with fewer error patterns.

Live trust scores

loading…

Server	Trust	Verdict	Receipts	Flags	Observed class	Observed verifiability	Observed settlement
fetching…

Observed metadata is display-only and does not affect current scoring or /v1/select behavior.

How these scores get used

The xaip-claude-hook fetches this API before every MCP tool call. If a server scores below the caution threshold, you see inline in Claude:

⚠ XAIP: "fetch" trust=0.38 (low_trust, 40 receipts)
        Risk: high_error_rate

After each call, the hook signs a receipt (hashes only — no raw content leaves your machine) and posts it to the aggregator. Next caller sees a slightly more accurate score.

For programmatic decision-making across candidate servers (e.g. "pick the highest-trust server that can handle task X"), that logic lives in a sibling project — Veridict — which consumes XAIP scores as one of its inputs. XAIP is the data layer; Veridict is the decision layer.

Design notes (not live behavior)

The current live scoring is v0.4. A v0.5 spec draft introduces a tool class taxonomy (advisory · data-retrieval · computation · mutation · settlement) as a design note. Current live scoring and /v1/select behavior do not use class metadata for ranking; observed metadata is display-only. Spec draft: XAIP-SPEC-v0.5-DRAFT.md.

For settlement-class tools the spec describes an optional anchorTxHash for ledger-backed verification — a reference demo anchors to an XRPL testnet transaction (source). This is one optional integration; XAIP is not a payment rail and not a settlement guarantee.

Known limitations. ~3,100 receipts across 10 servers — still a small dataset. The current public dataset is MCP-heavy because MCP was the first integration target. A single aggregator node (BFT quorum is the next milestone). The low_caller_diversity flag means one caller dominates the dataset — trust is uncertain because there aren't enough independent sources yet. The multi-caller mechanism itself has been verified (method + before/after data), but in practice the active graph is still populated mostly by one operator. Closing that gap is what the contributor paths below are for. Error classification is heuristic; false positives possible.

How to contribute

Adding one more independent caller to the graph is the single most useful thing someone outside this repo can do. There are three paths, from shortest to most involved:

One-off run (≈30 seconds, no clone, no signup):

npx xaip-caller

Generates an Ed25519 caller key on first run, makes five real HTTP tool calls (GitHub API, httpbin, this project's own trust API), signs an execution receipt for each, POSTs them to the aggregator, and exits. No state outside ~/.xaip/caller-keys.json. Source: clients/caller.

Continuous contribution via Claude Code:

npm install -g xaip-claude-hook && xaip-claude-hook install

Before each MCP tool call the hook fetches the current score and warns inline if low; after each call it posts a signed receipt.

Full contributor path (real MCP servers, ~5 minutes): see run-a-caller.md. Or run your own aggregator node — the spec and reference implementation are MIT-licensed.