XAIP Protocol

Evidence Before Delegation

Wrap an agent's tool calls once. Use the resulting receipt history as a local reliability signal today; share the same signed receipts later as portable, independently verifiable evidence.

Provider-neutral open protocol · Public API, no account required

Start here → Trust Evidence Before Delegation · one-screen live demo  |  Browser playground

Spec → Initial Internet-Draft: draft-xkumakichi-xaip-receipts (individual, not IETF-approved)

Receipts are the primary artifact: Ed25519 records of AI agent tool execution co-signed by both the executing agent and the caller, so neither side can unilaterally fabricate one (hashes only — no raw content leaves your machine), portable across MCP, LangChain.js, OpenAI-compatible tool-call loops, and other runtimes. Trust scores below are one derived view over those receipts — not a claim of absolute safety or correctness. The current public dataset is MCP-heavy because MCP was the first integration target.

Every score below is computed from real receipts and is queryable without auth:

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

trust — observed success probability for this server, based on signed execution receipts. Higher means the server has succeeded more often, across more independent callers, with fewer error patterns.

Use it in your own agent

Before your agent delegates, look up what execution evidence exists for the candidates:

import { precheck } from "xaip-sdk";

const result = await precheck({
  task: "Summarize this document",
  candidates: ["context7", "fetch", "unknown-server"],
});
// result.ranked   — execution evidence per candidate
// result.selected — the eligible candidate, or null

Or start emitting receipts from an existing MCP server with one wrapper: withXAIP(server). Receipts are signed locally — nothing extra is required to read them back. The live scores below are the same evidence, aggregated across independent callers.

Live trust scores

loading…
Server Trust Verdict Receipts Flags Observed class Observed verifiability Observed settlement
fetching…

Observed metadata is display-only and does not affect current scoring or /v1/select behavior.

How these scores get used

The xaip-claude-hook fetches this API before every MCP tool call. If a server scores below the caution threshold, you see inline in Claude:

⚠ XAIP: "fetch" trust=0.38 (low_trust, 40 receipts)
        Risk: high_error_rate

After each call, the hook signs a receipt (hashes only — no raw content leaves your machine) and posts it to the aggregator. Next caller sees a slightly more accurate score.

For programmatic decision-making across candidate servers (e.g. "pick the highest-trust server that can handle task X"), the SDK exposes precheck() — an evidence lookup that ranks candidates using available execution evidence and leaves the delegation decision to the caller. See the precheck() guide.

Design notes (not live behavior)

The current live scoring is v0.4. A v0.5 spec draft introduces a tool class taxonomy (advisory · data-retrieval · computation · mutation · settlement) as a design note. Current live scoring and /v1/select behavior do not use class metadata for ranking; observed metadata is display-only. Spec draft: XAIP-SPEC-v0.5-DRAFT.md.

For settlement-class tools the spec describes an optional anchorTxHash for ledger-backed verification — a reference demo anchors to an XRPL testnet transaction (source). This is one optional integration; XAIP is not a payment rail and not a settlement guarantee.

Known limitations. ~4,500 receipts across 10 servers — still a small dataset. The current public dataset is MCP-heavy because MCP was the first integration target. A single aggregator node (BFT quorum is the next milestone). The low_caller_diversity flag means one caller dominates the dataset — trust is uncertain because there aren't enough independent sources yet. The multi-caller mechanism itself has been verified (method + before/after data), but in practice the active graph is still populated mostly by one operator. Closing that gap is what the contributor paths below are for. Error classification is heuristic; false positives possible.

How to contribute

Adding one more independent caller to the graph is the single most useful thing someone outside this repo can do. There are three paths, from shortest to most involved:

One-off run (≈30 seconds, no clone, no signup):

npx xaip-caller

Generates an Ed25519 caller key on first run, makes five real HTTP tool calls (GitHub API, httpbin, this project's own trust API), signs an execution receipt for each, POSTs them to the aggregator, and exits. No state outside ~/.xaip/caller-keys.json. Source: clients/caller.

Continuous contribution via Claude Code:

npm install -g xaip-claude-hook && xaip-claude-hook install

Before each MCP tool call the hook fetches the current score and warns inline if low; after each call it posts a signed receipt.

Full contributor path (real MCP servers, ~5 minutes): see run-a-caller.md. Or run your own aggregator node — the spec and reference implementation are MIT-licensed.