If 2023 was the year the world met the chatbot, the years since have been about a more ambitious idea: software that does not just answer, but acts. These systems are usually called AI agents, and "agentic AI" has become one of the most-used — and most-stretched — phrases in technology.
Strip away the marketing and the concept is straightforward.
What an AI agent actually is
An AI agent is a system built around a large language model (LLM) that can plan and take actions to achieve a goal, rather than producing a single block of text and stopping.
In practice, an agent combines three things:
- A model that can reason about what to do next.
- Tools it is allowed to use — a web search, a code interpreter, a calculator, a database, or any software with an API.
- A loop that lets it act, observe the result, and decide on the next step until the goal is met or it gives up.
That loop is the heart of the idea. A chatbot is a single turn. An agent is many turns, mostly with itself, checking its own progress.
A concrete example
Suppose you ask an agent to "find three suppliers for recycled packaging, compare their minimum order sizes, and put it in a table."
A plain chatbot would guess from memory. An agent would instead: search the web, open several supplier pages, extract the relevant numbers, notice if one page failed to load and try another, assemble the comparison, and format it — pausing to ask you if something is ambiguous. The difference is the ability to gather new information and respond to it.
Where agents work well today
Agents are most reliable when a task is well-defined and verifiable — when there is a clear way to check whether the work is correct.
- Software development. Writing code, running it, reading the error, and fixing it is a natural loop with a built-in test: does it run?
- Research and synthesis. Pulling facts from many sources into a structured summary, with citations.
- Data wrangling. Cleaning, reformatting and cross-checking structured data.
- Routine digital workflows. Filling forms, moving information between systems, and triaging requests.
Where they struggle
The same qualities that make agents powerful make them risky in the wrong context.
An agent is only as trustworthy as your ability to check its work. If you cannot tell whether the output is right, autonomy is a liability, not a feature.
Three weaknesses stand out:
- Compounding errors. In a long chain of steps, a small early mistake can snowball. More steps mean more chances to go wrong.
- Confident wrongness. Models can state incorrect things fluently. Without a verification step, an agent may "complete" a task incorrectly.
- Costly actions. Sending money, deleting files, or emailing customers are hard to undo. These deserve explicit confirmation and tight permissions.
How to deploy agents responsibly
Organisations getting value from agents tend to follow a few rules:
- Least privilege. Give the agent access only to what the task requires.
- Human-in-the-loop for high stakes. Require confirmation before irreversible actions.
- Verification built in. Where possible, give the agent a way to test its own output, and a way for you to audit it.
- Logging. Keep a record of what the agent did and why, so failures can be diagnosed.
The realistic outlook
The honest summary is that agents are genuinely useful and genuinely immature. They shine on bounded, checkable tasks and stumble on open-ended, high-stakes ones. The most productive way to think about them is not as autonomous employees but as fast, tireless assistants that still need a manager.
For now, the winning pattern is collaboration: let the agent do the legwork, and keep a human on the decisions that matter.