Production agents for People Ops

    Most "agents" in People Ops are demos with ambition. The ones that survive contact with production share a pattern: data first, structured context, exception handling, observability, human escalation.

    Matthew Bradburn··

    Most "agents" in People Ops today are demos with ambition. They work on a clean test case and fall over the moment they meet a real employee, a real policy edge, a real Friday afternoon.

    This is fixable. The agents that survive contact with production share a pattern. The ones that fail share an opposite pattern. Knowing the difference is the difference between an automation programme that compounds and one that quietly gets retired.

    Start with the data, not the app

    A common reflex in People Ops: there is a problem, so buy a tool. The market is happy to oblige. There is now an AI tool for every workflow on the org chart.

    The reflex is wrong. Most People functions are not held back by missing features. They are held back by the fact that the data sits in silos. The HRIS knows the employee. The ATS knows the candidate. Payroll knows the comp. Performance knows the rating. Engagement knows the sentiment. None of them know each other.

    Buying a tenth tool adds a tenth silo. The leverage is in the integration and orchestration layer.

    This is why a workflow tool like n8n, or Make, or Zapier at the lower end, often produces more value than the latest specialised HR app. It connects what you already have. It lets a small AI step run inside a workflow that touches the systems where the truth actually lives. It is auditable, fixable, and visible.

    A useful test: when something breaks, can you see which step failed? With a connected workflow built in n8n, the answer is yes. With a black-box vendor product, often no. Visibility is the price of being able to fix things at speed, which is the price of being able to scale.

    What an agent actually is

    The word "agent" has been so abused it now means almost anything. A clean working definition:

    An AI agent is a system that can take a goal, decide a next action, use tools, persist state, and move work forward with auditability.

    That is more than a chatbot. More than a workflow with an LLM step. It is closer to a junior operator with superpowers, and like a junior, it needs:

    • A clear job.
    • Access to the right systems.
    • Rules about what it can and cannot do.
    • Memory.
    • A way to recover when something breaks.
    • A human escalation path.

    If any of those are missing, you do not have an agent. You have a roulette wheel with a UI.

    The context stack

    The single biggest reason agents fail in production is missing context. Most teams build an agent like this: "here is a model, here are some tools, go figure it out." That fails because the agent does not know what matters, what happened five steps ago, which policy applies, or what "good" looks like in your business. So it guesses. And in business, a guessing system is worse than no system, because humans assume it is reliable until it burns them.

    The fix is to think in a stack, not a blob.

    Task context. What triggered this. What the user asked. What "done" means. Constraints (time, budget, permissions).

    Process context. Which workflow this sits inside. Required steps, approvals, SLAs. Exception rules. Ownership.

    Organisational context. Policies (comp philosophy, hiring bar, expense rules). Risk tolerance. Your systems (HRIS, ATS, finance, CRM). Your language (job levels, team names, cost centres).

    If you only give task context, the agent acts like a goldfish with a keyboard. If you give the full stack, it starts behaving like someone who actually works there.

    Context is not just "more text." It must be:

    • Structured. Clean payloads, not messy email threads. IDs, links, owners, timestamps.
    • Retrievable. The agent can fetch what it needs, not just what fits in the prompt.
    • Verifiable. Every claim it makes can be checked against the source.
    • Relevant. Filtered to what matters for this task.

    Tooling contracts and exception handling

    An agent without explicit tooling contracts is a hazard. Each tool the agent can call needs a contract: what it does, what inputs it expects, what outputs it returns, what side effects it has, when it can be called. Without contracts, the agent invents. With contracts, the agent stays inside the rails you set.

    Exception handling is the second pillar. What happens when the model returns nothing useful? When a tool call fails? When the policy is ambiguous? In production, these happen constantly. Agents that work in production have explicit retry logic, fallback paths, and a clear point at which they stop and escalate.

    The escalation path is not optional. Every production agent has a moment when it should hand off to a human. The agent that knows when to stop is more valuable than the agent that powers through.

    Observability and audit

    If you cannot see what the agent did, you cannot trust it. If you cannot trust it, you cannot scale it.

    Observability for People Ops agents has three components:

    • Trace. Every action the agent took, in order, with the inputs and outputs at each step.
    • Metrics. How often it succeeds, how often it escalates, how often it fails silently, what it costs.
    • Audit. Decisions tied to identifiable inputs, retained for the period your governance demands.

    The trace is the part most teams skip. It is also the part that matters most when something goes wrong, which it will. Without a trace, debugging an agent failure is guesswork. With one, it is a five-minute conversation.

    Where to start

    Not with agents. Start with workflows you can document and automations you can stabilise. Make sure the data the agent will rely on is clean and centralised. Set up the AI workspace so context is not invented from scratch every time.

    Then pick one well-bounded use case. A few that work in practice:

    • A policy FAQ agent that fields employee questions, retrieves the relevant policy, and escalates anything ambiguous.
    • An onboarding orchestrator that triggers the next step in a documented workflow when the previous one completes.
    • A comp benchmarking research agent that pulls market data, structures it, and hands a manager a comparison table for a hiring decision.

    Each of these has the same structure: bounded scope, clear escalation, observable trace, real exception handling. Each one earns the right to expand. None of them try to be the agent that does everything.

    The pattern across all of them is the move from prompts to systems, via the domain map that tells you where to point the work, supported by the operating model that makes the work survive a quarter.

    Production agents are not magicians. They are operators. Build them like operators, and they will work like operators.

    What this connects to

    Auto-recommended next reads in the People Ops cluster, ranked by shared concepts and headings:

    Common questions

    What is the actual definition of an AI agent in a business context?
    A system that can take a goal, decide a next action, use tools, persist state, and move work forward with auditability. Not a chatbot that browses the web. Not a workflow with an LLM step. Closer to a junior operator with superpowers, who needs a clear job, access to the right systems, rules about what it can and cannot do, memory, a way to recover when things go wrong, and a human escalation path.
    Why do most agents fail in production?
    Lack of context, lack of state, lack of guardrails. Most teams spend 80% of their time choosing the model and 20% on what the agent actually needs to succeed. The ratio should be reversed. The model is a tuning decision. Context, state, tooling contracts, exception handling, observability, and human escalation are the difference between an agent worth real money and one worth nothing.
    Why does data matter more than apps for HR automation?
    Buying another HR app adds another silo. The bottleneck for most People functions is not features, it is the fact that data lives across HRIS, ATS, payroll, performance, and engagement systems that do not talk to each other. Building the integration and orchestration layer (often with n8n or similar) creates more leverage than buying a tenth tool. Apps add more layers. Data and orchestration remove them.
    Where should we start with agents in People Ops?
    Not with agents. Start with workflows you can document, automations you can stabilise, and clean data the agent can rely on. Then pick one well-bounded use case, an FAQ agent on policy, an onboarding orchestrator, a comp benchmarking research agent, with explicit guardrails and a clear escalation path. Build small, audit hard, expand only when the small thing has earned trust.
    12 min

    If this resonated, there's more.

    Subscribe to receive new Intelligence pieces as they're published. No noise — just the work.

    By subscribing you agree to our Privacy Policy. Unsubscribe any time.

    Diagnostic

    Where does your operating system stand?

    Take the AI Operating Index — a free 8-pillar diagnostic.

    Begin the index →