Agentic Workflows Are Just Automated Specifications

Everyone wants agents. Nobody wants to write the spec.

The pitch is compelling: deploy an AI agent that handles your customer support tickets, processes invoices, triages bug reports, or manages your deployment pipeline. The agent works 24/7, doesn’t take breaks, and gets faster as the model improves. What’s not to want?

So businesses are rushing to build agentic workflows. And most of them are skipping the step that determines whether the agent will actually work: defining precisely what it should do.

An AI agent is not magic. It’s automation. And automation is only as good as the specification it executes. If the spec is vague, the agent will be unpredictable. If the spec is wrong, the agent will be confidently, consistently wrong. If there is no spec, the agent is making it up as it goes.

It’s the same problem software has always had, just running at machine speed.

An agent is a spec executor

Strip away the hype and look at what an AI agent actually does. It receives an input (a support ticket, an email, a webhook, a scheduled trigger). It follows a set of instructions to process that input. It produces an output (a reply, an action, a decision, a handoff).

That’s a specification. Inputs, processing rules, expected outputs. The agent is executing a spec, the question is whether that spec is explicit or implicit.

When the spec is explicit, structured, validated, and unambiguous, the agent behaves predictably. It does what the spec says. You can test it against the spec. You can audit it against the spec. When the agent does something wrong, you can trace the error to a specific clause in the specification and fix it.

When the spec is implicit, buried in a prompt, scattered across system messages, or just assumed, the agent is interpreting vibes. It’s doing its best to infer what you want based on patterns. Sometimes it gets it right. Sometimes it doesn’t. And when it doesn’t, there’s no specification to debug against. You just tweak the prompt and hope.

The prompt is not a specification

This is the most common mistake in agentic workflow design: treating the prompt as the specification.

A prompt is instructions to an LLM. It’s useful. It’s necessary. But it’s not a specification in any meaningful sense. Prompts are fragile. Small changes in wording can produce dramatically different behavior. They’re opaque. There’s no systematic way to verify that a prompt produces correct behavior across all input conditions. They’re not portable. A prompt optimized for one model may behave differently on another.

A specification says: “When a customer submits a refund request for an order placed within the last 30 days, and the order status is ‘delivered,’ approve the refund and send a confirmation email. If the order is older than 30 days, escalate to a human agent with the order details attached.”

A prompt says: “You are a helpful customer service agent. Process refund requests appropriately.”

Both can power an agent. One is verifiable. The other is a suggestion.

Garbage spec, garbage agent

The agent inherits every flaw in its specification.

If the spec doesn’t address what happens when the customer provides an invalid order number, the agent will improvise. Maybe it asks for clarification. Maybe it hallucinates a response. Maybe it processes the request against the wrong order. You won’t know until it happens in production, because the spec never defined the expected behavior for that case.

If the spec says “escalate complex issues” without defining what “complex” means, the agent will use its own judgment. That judgment will be inconsistent across interactions, will shift when the underlying model is updated, and will be impossible to audit because there’s no standard to audit against.

Every ambiguity in the specification becomes unpredictable behavior in the agent. And unpredictable behavior in an autonomous system that handles real business processes, money, customer data, operational decisions, is not a feature. It’s a liability.

The parallel to code

This is exactly the problem we described in “The Study of Apps”, applied to a new domain.

In traditional software, implicit specifications produce unpredictable code. The developer’s mental model is the spec, and when that model is incomplete or wrong, the code is incomplete or wrong. The specification problem is a code quality problem.

In agentic workflows, implicit specifications produce unpredictable agents. The prompt or system message is the spec, and when it’s vague or incomplete, the agent is vague or incomplete. The specification problem is an agent quality problem.

The solution is the same in both cases: make the specification explicit. Define the expected behavior in structured, verifiable terms. Test the implementation (whether code or agent) against the specification. Lock the spec when it’s validated.

Different domain, same problem.

Auditable agents

Explicit specifications make agents auditable. This matters more than most teams realize.

When an agent handles a customer interaction incorrectly, the first question is: what went wrong? With an explicit specification, the answer is traceable. The spec says the agent should have done X. The agent did Y instead. Either the spec was wrong (in which case, update the spec) or the implementation was wrong (in which case, fix the implementation). There’s a clear trail from expected behavior to actual behavior.

Without a specification, the investigation is: “Why did the agent do that?” And the answer is usually: “We’re not sure. Let’s try adjusting the prompt.” This is debugging by intuition, and it doesn’t scale.

For businesses in regulated industries, finance, healthcare, insurance, auditability isn’t optional. Regulators want to know what the system did and why. “The AI decided” is not an acceptable answer. A locked specification that defines the expected behavior and a validation record that confirms the agent follows it. That’s an audit trail.

The real work is the specification

Most agentic workflow projects spend 90% of their effort on the agent implementation, choosing the model, tuning the prompts, building the tool integrations, and 10% on defining what the agent should actually do. It should be the other way around.

The model is a commodity. The tool integrations are plumbing. The specification is the intellectual property. It’s the precise definition of how the business wants this process to work, under all conditions, including edge cases, failure modes, and handoff criteria.

Getting that specification right is the hard part. It requires understanding the business process deeply, interviewing the people who currently do the work, cataloguing the edge cases they’ve learned to handle over years of experience, and encoding all of it in a form that’s precise enough for an agent to execute reliably.

That’s specification work. It’s the same work that software has been skipping for decades. And now that agents are executing business processes autonomously, the cost of skipping it has never been higher.

Agents make the spec problem visible

Agentic workflows make the specification problem impossible to ignore.

When a human handles a process, they compensate for vague specifications with judgment and experience. They know that “escalate complex issues” really means “escalate anything involving more than $5,000 or a repeat complaint.” The implicit spec works because the human fills in the gaps.

When an agent handles the same process, the gaps become visible. The agent doesn’t have twenty years of tacit knowledge. It has the specification, and only the specification. If the spec doesn’t define what “complex” means, the agent’s behavior will expose that gap immediately.

This is actually useful. Deploying an agent against a business process is the fastest way to find out where your specifications are incomplete. The agent is a specification stress test. Every failure is a missing clause.

The teams that embrace this, that treat agent failures as specification gaps rather than AI limitations, build better agents and better specifications. The ones that keep tweaking prompts without formalizing the spec are playing an endless game of whack-a-mole.