How do you prioritise AI workflows in a People function?

Score every candidate on four dimensions: value, frequency, fit and risk. One to five each, done with the people who actually do the work, not just the ones who lead it. A team of six can score twenty candidate workflows in under two hours. The ones that score high on value and frequency, suit AI well, and carry manageable risk get built first. Everything else waits.

Which AI workflows should a People team build first?

The highest-scoring workflow the team can build to a working state in four to six weeks with what it already has. Not the biggest, not the loudest, not the most impressive on a slide. Recruiter coordination, first-draft documents and scorecard summarising clear that bar on almost every backlog. Calibration and comp decisions almost never do, so they wait until the team has built confidence on simpler work.

What is an AI workflow assessment?

A short, structured scoring exercise, not a full audit. It turns a wishlist of AI ideas into a sequenced 90-day plan using the same four dimensions for every candidate, so the decisions are comparable and defensible. A full audit runs for weeks. This takes an afternoon and produces a ranked list, an owner per item and a defined first build.

How often should you re-score the automation backlog?

Once a quarter is about right. More often and workflows in build never get a stable run. Less often and you miss what changed in the world and in what AI can now do well. Re-scoring is also when you retire ideas that scored well twice and still were not built, because that is the backlog telling you something the score did not.

A workflow assessment framework for People Ops

A financial-data company, about 600 people, put us in front of a whiteboard covered in AI ideas. Recruiter coordination, survey theming, policy drafts, calibration support, onboarding sequences, a dozen more. Everyone in the room had a favourite and everyone's favourite was different. The instinct was to argue it out and start with whichever champion argued hardest. We scored it instead, and seven weeks later five of those workflows were in production. The other seven were not, and nobody was upset about it, because the reason was written down. A workflow assessment framework is what turns that whiteboard into a sequenced plan: you score every candidate on the same four dimensions, value, frequency, fit and risk, and let the numbers order the build.

Why teams pick the wrong workflow first

Left to instinct, most teams choose badly, and they choose badly in a predictable way. They pick the workflow someone is loudest about, or the one that sounds most impressive on a slide, or the one a vendor demoed last Tuesday. None of those signals correlate with value. Volume of opinion correlates with seniority and confidence. Slide-worthiness correlates with novelty. A vendor demo correlates with whatever the vendor happens to sell.

Three months later the build is half-finished, the team is sceptical, and the next workflow is harder to fund because the last one did not land. This is the pilot trap wearing a People Ops coat. The demo worked. The rollout did not. The fix is not more enthusiasm or a better vendor. It is a way to compare unlike things on the same axis before anyone writes a line of automation, so the first build is the one most likely to ship and compound, not the one that shouted.

That comparison is the whole job of the framework. It is not clever. Its value is entirely in the discipline of applying it to everything, honestly, in a room with the right people.

The four dimensions: value, frequency, fit, risk

Every candidate workflow gets scored on four dimensions, one to five each. None of them are novel. The discipline is scoring honestly and never skipping one because it is inconvenient.

Score every candidate on these four, one to five

Value: how much time, money or quality lift does a working version deliver per cycle?

Fails when: You can only describe it in the abstract, like 'saves time on hiring', instead of 'removes 4 hours of recruiter coordination per role, across 60 roles a year'

Frequency: how often does this work actually happen?

Fails when: It is annual or quarterly, so the team forgets the system between cycles and never builds the muscle

Fit: how well-suited is this work to what AI does well today?

Fails when: It is high-stakes judgement, a sensitive ER conversation, or anything where being wrong is far worse than being slow

Risk: what is the cost of being wrong, and can you fix it fast?

Fails when: A wrong output lands in a real decision or a real conversation before a human can catch it

Score risk before value and fit, not after. People inflate the things they already want to build.

A note on each. Value has to be specific or it is not a score. Not "improves manager experience" but "saves 20 minutes per manager per quarter on prep for performance conversations, across 35 managers". If you cannot make a candidate that specific, it is not ready to be scored. Frequency is the one teams underweight. A 30-minute weekly task beats a 4-hour quarterly one, even though the quarterly one feels bigger. The weekly build compounds 26 hours of freed time a year and a team that touches the system every week. The quarterly build frees 16 hours and a team that forgets the system between cycles.

Fit is where the expensive mistakes hide. Drafting, summarising, classifying, structured extraction and first-pass analysis are a natural fit. High-stakes judgement is not. Score it honestly, because the costliest failures come from forcing AI into work it is bad at and then blaming the tool. Risk gets scored separately and first, because high-risk workflows can still be built. They just need different guardrails, human-in-the-loop, a narrower scope, a slower rollout, and you want to know that before you start, not after the first bad output. The combined score is not a perfect ranking. It is a far better starting point than the loudest voice in the room.

Where AI fits, and where it does not

Before the scores mean anything, the room has to agree on what "good fit" honestly looks like. This is the fit dimension made concrete, and it is worth getting opinionated about, because it settles half the arguments before they start.

Work AI fits today

Recruiter and hiring-manager kickoff docs from a short brief

Summarising interview scorecards into a hiring recommendation

Theming free-text survey comments into named patterns

First-draft policies, FAQs and internal comms

First-pass triage of routine people queries

Work to keep human-led

Performance calibration and rating decisions

Compensation and promotion calls

Sensitive employee-relations conversations

Anything legally or ethically consequential if wrong

Judgement where being wrong beats being slow

The right-hand column is not off-limits forever. AI can assist it later, under guardrails. It just should not be the first thing you build.

The line between the columns is not permanent. It moves as models improve and as your team gets better at scoping. But on day one it is a reliable filter, and the workflows on the left are where the compounding lives: high frequency, low risk, genuinely suited to the technology. Start there, build the team's confidence and the underlying plumbing, then let the harder work earn its place.

A worked example

Take a real People Ops backlog from a 250-person company. Six candidates went on the board and through the same filter. Risk is scored so that five means safe, so every column reads the same direction and the four simply add up.

Workflow	Value	Freq	Fit	Risk (5 = safe)	Score
Recruiter kickoff doc generation	4	5	5	5	19
Interview scorecard summarisation	4	5	5	4	18
Onboarding first-week Slack sequence	3	5	5	4	17
Engagement survey free-text theming	5	2	5	4	16
New policy first-draft generation	3	2	4	4	13
Performance calibration recommendations	5	1	2	1	9

Three workflows separate from the pack: kickoff docs, scorecard summarisation and the onboarding sequence. All three are weekly or near-weekly, score well on fit, and carry manageable risk. They are the first cluster.

The calibration workflow, which was, predictably, the one most loudly requested, scores a nine. It has real value, a five. It is also annual, a poor fit for current AI, and high-risk, so it waits and comes back later with proper guardrail design, once the team has shipped simpler work and earned the right to attempt it. This is the conversation the framework forces, and it forces it without making it personal. Nobody is overruling the sponsor. The scores are.

Scoring honestly is the hard part

The maths is trivial. The honesty is not. Three rules, all learned the hard way.

Score with the people who do the work, not just the people who lead it. A Head of People scores value high and frequency low, because they see the strategic version of the work. The Talent Partner doing the actual coordination scores value moderate and frequency very high, and they are usually right about the shape. Both readings matter. Only one of them is in the room by default.

Score risk first, and imagine the failure out loud. People inflate value and fit on workflows they already want. Scoring risk first, and saying plainly what a wrong output would do, keeps the rest of the scoring honest.

Be ruthless about specificity. If a candidate cannot be made specific to the level of "20 minutes, 35 managers, per quarter", it is not a workflow yet. It is an ambition. Park it until someone can name the cycle, the volume and the person who owns it.

Most teams find, after their first honest scoring pass, that roughly a third of the backlog quietly disappears. It was never a set of workflows. It was wishful thinking wearing the costume of a plan.

From score to roadmap

A score gives you a ranking. A ranking is not yet a roadmap. Three more moves turn one into the other, and they are the difference between a spreadsheet and something the CFO will fund.

The sheet
Rank
Order the backlog by combined score. This is the raw list, not the plan.
First build
Sequence by buildability
Pick the highest-scoring workflow the team can ship to working state in four to six weeks with what it has now. A slightly lower score that ships beats a top score still being scoped in week ten.
Compounding
Cluster by shared plumbing
Group workflows that need the same integration, say the ATS wired to the workspace, and build them together. The second and third in a cluster cost a fraction of the first.
20% held
Reserve for the unscoreable
Ring-fence a fifth of build capacity for what did not exist when you scored: a reorg, a regulatory change, an experiment a champion brings back from a course.

The reserve is the move teams skip and regret. Some of the highest-return work in a year comes from things that were not on the board when you scored: a tool deprecation, a January budget review that reshapes priorities, an opportunistic build a champion returns with. Commit every hour to the scored backlog and none of that gets done. Hold a fifth back and the team keeps the ability to respond. A 90-day roadmap that comes out of this usually looks like one cluster of three shipped, one harder solo workflow underway, and one opportunistic build that surprised everyone. Not glamorous. It compounds, quarter after quarter. If you want that first cluster scoped and built end to end rather than left as a spreadsheet, that is exactly the shape of a Grain Audit: one process, a ranked plan, a 90-day sequence you keep.

When to re-score, and what the framework is really for

Re-score once a quarter. Not more, because workflows in build need a stable run. Not less, because the world moves and so does what AI does well. Re-scoring is also when you retire what has not been built. If a workflow has scored well twice and still not shipped, that is a signal: either the team does not actually want it, or it is harder than the score suggested, or there is a sponsorship problem nobody has named. Pretending the backlog is still live is worse than removing the line.

The real job of the framework, though, has almost nothing to do with rankings. It gives the team a shared language for deciding which AI work is worth doing. Without one, every conversation about a new workflow collapses into enthusiasm or doubt, both exhausting, neither scalable. With one, the conversation gets shorter and better. Someone proposes a workflow. The team scores it together. The score either confirms what everyone suspected or surfaces a disagreement worth having. Either outcome moves the work forward.

That is the test of a good framework: not perfect rankings, a conversation that gets easier every time. It is the same discipline behind identifying the efficiency gaps AI can actually fill and the reason a scored backlog rarely falls into the pattern that stalls AI pilots at production. Where a scoring pass ends, a deeper automation audit begins, and both feed the same AI workspace your People team runs on. Once the first cluster ships, measuring the value it returns is what keeps the next quarter funded.

Score the backlog. Sequence the build. Ship the first cluster. Re-score, and do it again. Three quarters in you will have an automation portfolio the team understands, the CFO can defend, and the function genuinely depends on. It is built from a spreadsheet, not a strategy deck.

Common questions

How do you prioritise AI workflows in a People function?: Score every candidate on four dimensions: value, frequency, fit and risk. One to five each, done with the people who actually do the work, not just the ones who lead it. A team of six can score twenty candidate workflows in under two hours. The ones that score high on value and frequency, suit AI well, and carry manageable risk get built first. Everything else waits.
Which AI workflows should a People team build first?: The highest-scoring workflow the team can build to a working state in four to six weeks with what it already has. Not the biggest, not the loudest, not the most impressive on a slide. Recruiter coordination, first-draft documents and scorecard summarising clear that bar on almost every backlog. Calibration and comp decisions almost never do, so they wait until the team has built confidence on simpler work.
What is an AI workflow assessment?: A short, structured scoring exercise, not a full audit. It turns a wishlist of AI ideas into a sequenced 90-day plan using the same four dimensions for every candidate, so the decisions are comparable and defensible. A full audit runs for weeks. This takes an afternoon and produces a ranked list, an owner per item and a defined first build.
How often should you re-score the automation backlog?: Once a quarter is about right. More often and workflows in build never get a stable run. Less often and you miss what changed in the world and in what AI can now do well. Re-scoring is also when you retire ideas that scored well twice and still were not built, because that is the backlog telling you something the score did not.

11 min

Not sure where your function stands yet?Take the Readiness Assessment→

When reading turns into doing

The Grain Audit maps one People Ops process end to end, ranks the highest-return automations, and hands you a 90-day plan you keep whether or not we work together.

Two weeks. £2,000, credited in full against a programme. Three slots a month.

Book a Grain Audit

If this resonated, there's more.

Subscribe to receive new Intelligence pieces as they're published. No noise, just the work.

By subscribing you agree to our Privacy Policy. Unsubscribe any time.