ResearchOct 7, 2025
Climbing the Hills That Matter
Exploring the challenges with current evaluation methods and proposing a new approach grounded in production data.
Andrew Li, Alex Shan
Explore lessons we've learned from evaluating agents in production, improving reliability, and building AI systems teams can trust.
Subscribe to our newsletter