Agentforce in production: a field guide

Agentforce shipped in late 2024 as Salesforce's response to a question every enterprise software vendor was suddenly being asked: where are the agents? By the middle of 2026 the answer on a demo stage is confident and the answer in a production org is, far too often, a chatbot that cannot finish a task. The gap between the two is not a gap in the technology. It is a gap in how the work is framed before anyone opens the Agent Builder.

We have delivered four Agentforce implementations across the past six months, in financial services, subscription retail, and professional services. The pattern that separates the engagements that paid back from the ones that stalled is small enough to state in a sentence, and it has nothing to do with model quality.

01 / DEFINITIONWhat the product actually is

It helps to strip the marketing language away and describe Agentforce as the three mechanical parts it is built from. The first is a reasoning loop: a language model that, given a goal and a set of available tools, decides which tool to call next and when the task is complete. The second is a library of actions you expose to that loop, which are nothing more exotic than Flows, Apex methods, prompt templates, and Data Cloud retrievers that you have marked as agent callable. The third is the Einstein Trust Layer, which sits between the org and the model, redacting sensitive fields before a prompt leaves your boundary and logging every exchange that returns.

None of this is magic, and treating it as magic is the first mistake. It is a capable reasoning engine with carefully fenced access to your data and your existing automation. The quality of the result is governed almost entirely by how precisely you fence that access.

02 / THE MISTAKEBuilding the agent before the work

The common failure is procedural, not technical. A team licences Agentforce, opens the builder, and constructs a general assistant that can answer questions about accounts and opportunities. It demonstrates beautifully. Three weeks into production it is abandoned, because a general assistant that can do a little of everything cannot reliably finish any single thing a person actually needs done.

Reverse the order. Pick one workflow that is repetitive, has clearly bounded inputs, and produces a clearly bounded output. Build the agent around that one workflow. Ship it. Only then add the second.

The workflows that fit this shape are easy to recognise once you look for them. A renewals manager reviewing eighty contracts a quarter, where the agent reads the contract terms, pulls renewal data from the opportunity, and drafts a tailored renewal note. A tier-one service representative, where the agent reads the inbound case, checks the knowledge base, drafts a response, and asks for confirmation before sending. A sales operations analyst running a weekly pipeline review, where the agent flags stalled deals and proposes the next action on each.

The workflows that do not fit are equally recognisable. Anything that turns on creative judgement. Anything where a wrong answer carries legal, financial, or clinical consequence. Anything whose inputs change shape every week, because the agent's instructions cannot keep pace with a moving target.

03 / SEQUENCEThe rollout that holds up

Across the four projects, the same sequence has proven durable. The durations below assume a single business unit and a single first workflow, not an enterprise transformation programme.

The sequence we use for a first workflow. Each phase ends with a definite handoff, which is what stops the project from quietly sliding.

Discovery is not a kickoff meeting. Pick the workflow, then sit with the people who perform it today and watch them do it. Write down the inputs, the outputs, and every point where they pause to think. Those pauses are where the agent earns its cost.

Action design is production engineering, not labelling. The agent calls your Flows and Apex methods, and it reads their descriptions to decide which to use. If a competent human cannot understand an action from its description alone, the agent will use it wrongly. Treat each description as part of the contract.

Prompt engineering is the most underrated phase. The default system prompt is built for demonstrations. Yours needs to specify tone, the conditions under which the agent should ask a clarifying question, and the things it must refuse outright.

Trust and safety means configuring the Trust Layer, deciding who may audit transcripts, and deciding what the agent is permitted to put in writing to a customer. Pilot means five users, daily check-ins, and a failure log that drives improvements to the prompt and the actions in that order. Rollout means adding the next workflow and repeating.

04 / EVIDENCEOne implementation, measured

The clearest result we have on record came from a subscription retailer's service team. The workflow was deliberately narrow: a customer asks a billing question, the agent retrieves the account, checks the contract, drafts a response, and either sends it or routes to a human for anything outside its remit.

A person working at a laptop reviewing data, representing a service agent at a console — The agent was not faster at writing than the representative. It was faster at the lookup, and most service work is lookup.

Before the rollout, the team averaged roughly eight minutes per ticket, and close to a third of tickets needed a follow-up message because the first reply missed something. Eight weeks into the pilot, average handle time had fallen to about three and a half minutes, and the follow-up rate sat near eleven percent. The agent did not write faster than a skilled human. It retrieved the relevant facts faster, and retrieval was the bulk of the work. That is the lesson worth carrying into the next project.

Field note

If you cannot describe your candidate workflow in two sentences, it is not ready for an agent. The model can only operate inside the boundary you are able to articulate, and a vague boundary produces a vague, untrustworthy result.

05 / OPERATIONSThree things to watch once it is live

Cost. Every conversation consumes tokens. Set a per-user budget alert early. We have watched an agent stretch from five turns to thirty-five because it was given a vaguely described Apex method and kept probing it.

Drift. The model behind Agentforce is updated by Salesforce on its own schedule. A prompt that works today can quietly degrade in six months. Add regression tests for your critical agent flows and treat them like unit tests.

Permissions. An agent runs with the permissions of the user who invoked it. That is mostly correct, but it means a carelessly phrased instruction can surface data the user should not have seen through ordinary field security. Audit transcripts monthly for the first quarter after launch.

06 / RESTRAINTWhen not to reach for an agent

If a task can be handled by a Flow with three branches, use the Flow. It is deterministic, free at runtime, and far easier to debug at three in the morning. Agents earn their cost only when a task requires reading genuinely unstructured input, a customer email, a contract clause, a meeting note, and producing a structured outcome from it. When the input is already structured, an agent is a language-model tax with no return.

The teams that succeed with Agentforce are unglamorous about it. They pick one narrow, repetitive, well-bounded workflow, instrument it carefully, ship it to five people, and expand only once it has earned the right. The teams that struggle build an assistant that can do anything and therefore finishes nothing. Start narrow. Earn the next step.

Agentforce in production

01 / DEFINITIONWhat the product actually is

02 / THE MISTAKEBuilding the agent before the work

03 / SEQUENCEThe rollout that holds up

04 / EVIDENCEOne implementation, measured

05 / OPERATIONSThree things to watch once it is live

06 / RESTRAINTWhen not to reach for an agent

Considering an Agentforce build?