Research
Agent Judge: Solving Long-Horizon Evals for Production Agents
Rishi Gujjar, Andrew Li·May 27, 2026
Rishi Gujjar, Andrew Li·May 27, 2026
Andrew Li, Alex Shan·Oct 7, 2025
Subscribe to our newsletter
Get new posts delivered to your inbox.
Why long-horizon agents need agentic judges that can search, verify, and adapt.
Infrastructure for improving AI agents from production data.
Exploring the challenges with current evaluation methods and proposing a new approach grounded in production data.