Research
Agent Judge: Solving Long-Context Evals for Production Agents
Rishi Gujjar, Andrew Li·May 27, 2026
Rishi Gujjar, Andrew Li·May 27, 2026
Andrew Li, Alex Shan·Oct 7, 2025
Subscribe to our newsletter
Get new posts delivered to your inbox.
Why production agent evals need agentic judges that can search, verify, and adapt.
Infrastructure for improving AI agents from production data.
Exploring the challenges with current evaluation methods and proposing a new approach grounded in production data.