Research Engineer
We are Judgment. We build infrastructure for Agent Behavior Monitoring (ABM): surfacing silent behavioral issues, understanding how agents behave in production, and turning interaction data into actionable signals.
Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. When something breaks, they’re not stuck in reactive incident triage. They can see which behaviors are trending, which configurations caused regressions, and what to actually fix.
We've raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others.
The Role:
We are looking for Research Engineers to build AI systems that use agent interaction data to help us understand how agents behave, evaluate them at scale, and improve them through learning and feedback.
Your research will not live on a whiteboard. You'll work directly with real-world agent data, apply frontier methods in production, and see your work ship immediately into the product. By making agent behavior measurable and debuggable, your systems will support teams deploying agents across finance, legal, operations, and other high-stakes workflows. You will own projects end-to-end, with significant autonomy, and work closely with the team to build self-improving agent systems.
What You'll Do:
Build systems to aggregate, index, and analyze large-scale agent interaction data to extract meaningful evaluation signals
Develop agent-based systems for analyzing and evaluating complex, long-running behaviors
Design and implement post-training and optimization workflows to improve agent behavior
Build internal tools and infrastructure to support rapid experimentation, analysis, and training
What We're Looking For:
You identify with at least one of the following:
You care about data quality, evaluation, and benchmarking, and are comfortable working hands-on with messy data
You have experience building agent systems and working with them in real-world or production settings
You have a strong background in reinforcement learning, agents, or machine learning fundamentals
You are comfortable working across infrastructure and systems, spanning training, data pipelines, and model serving.
You are comfortable working across teams to translate research into product, balancing real-world customer constraints and tradeoffs.
You enjoy turning ambiguous problems into clear, well-designed plans
Why Judgment?
The problem matters. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving.
Work with the best. We’re a small, highly collaborative team that ships far above our size. We take pride in building a category-defining product and value engineers who are persistent, creative, and ownership-driven.
Grow with us. Ideas win on merit, not hierarchy. You’ll work with people who stay curious, pathfind through ambiguity, and execute relentlessly. We look for people who are bold in their ideas, comfortable taking thoughtful risks, and motivated by solving hard problems. We reward impact, expect ownership, and are intentional about growing talent from within as we scale.
We take care of our people. Competitive salary and equity, full benefits, chef-cooked meals daily, gym access, and whatever tools or resources you need to do your best work.
We work in person in San Francisco.