Put an agent into a real workflow and one question arrives on a schedule: what did it actually do? It might arrive from a security reviewer before launch, an auditor after an incident, or a customer in a dispute. The honest answer, for most teams today, is "let me check the logs" — followed by an hour of grep.

Logs feel like they should be the record. They aren't. Here's the gap.

A log is an implementation detail

Logs are written by whoever wrote the code, in whatever format they chose, at whatever points they remembered. They're optional, inconsistent across services, and easy to lose. Crucially, a log line is a side effect of execution, not a contract of it. Nothing guarantees the log exists, that it's complete, or that it hasn't been edited.

So when you try to reconstruct "what the agent did," you're stitching together partial, untrusted fragments across systems — and hoping the important moment was logged at all.

Evidence is part of the contract

Evidence is different in four specific ways:

It's mandatory, not optional. Every attempt to do something emits a structured event — started, completed, failed, or denied — because the boundary requires it, not because someone remembered to log.
Denials are first-class. "The action was blocked by policy" is a recorded outcome, not a swallowed exception or a missing line.
It's correlated. One id ties the whole session together, across hosts, so you can replay a causal trace instead of guessing.
It's tamper-evident. The record is hash-chained, so an altered or missing event is detectable.

Application logs

a byproduct

CHP evidence

the record

Completeness

Whatever someone remembered to log

Every attempt at the boundary, by contract

Integrity

Editable text — trust the writer

SHA256 hash-chained — alteration is detectable

Denials

Usually an error or an absence

A first-class outcome with a reason code

Reconstruction

Stitched together after the fact

Replayed in order by correlation id

Completeness

Application logs

Whatever someone remembered to log

CHP evidence

Every attempt at the boundary, by contract

Integrity

Application logs

Editable text — trust the writer

CHP evidence

SHA256 hash-chained — alteration is detectable

Denials

Application logs

Usually an error or an absence

CHP evidence

A first-class outcome with a reason code

Reconstruction

Application logs

Stitched together after the fact

CHP evidence

Replayed in order by correlation id

A log is something you write. Evidence is something you can be held to.

A log says something probably happened. Evidence says this happened, here's the ordered trail, and you can prove it wasn't changed.

Why it has to be a protocol

You can't bolt trustworthy evidence onto logging after the fact, because the thing that makes evidence trustworthy is that it's produced at the boundary, the same way, every time, independently of the application. That's a protocol concern — a contract about how execution is observed — not a library you sprinkle in.

That's what the Capability Host Protocol does: it makes structured, replayable, tamper-evident evidence part of the invocation itself. And the place it's already real today is agents — one command captures every tool call your AI agent makes as evidence you can replay.

If "we can't prove what the agent did" is the thing blocking your rollout, that's the gap this closes.

Logs aren't evidence

A log is an implementation detail

Evidence is part of the contract

Why it has to be a protocol