Writing

Notes

On AI engineering, LLM evaluation, retrieval, and agent tooling.

June 8, 20268 min read
Ground-truth leakage in agentic evals
Why RAG and tool-using agent evals leak the target answer into retrieved context, the four channels it travels through, and two checks — a deterministic static scan and a blind baseline — to detect and fix it. Includes a zero-dependency open-source skill.