Jump to section
Last verified: May 16, 2026. Vendor pricing and benchmarks refreshed quarterly.
An AI agent is an LLM equipped with tools and placed inside an autonomous loop: the model decides what action to take, executes it, observes the result, then decides again, repeating until the task is done or stopped. That is the complete definition. In 2026, vendors have been applying the word “agent” to chatbots, scheduled prompt scripts, and rule-based workflows. The practice has a name: agent-washing. The actual bar is straightforward. For something to qualify as an AI agent, four things must be present: an LLM capable of tool use, at least one real tool it can call via function calling, an autonomous loop where the model directs its own next step based on what the tool returned, and a stopping condition. Claude Code meets that bar. A Salesforce Agentforce implementation built on routing trees does not, at least not in its baseline configuration.
A Chatbot Is Not an Agent (This Is the Most Important Distinction)
A chatbot is a read-only system. You send a prompt, it generates text, the exchange ends. Nothing happened in the world outside that conversation. The chatbot cannot file the support ticket it described. It cannot run the code it wrote. It cannot check whether the API call it suggested actually worked. A chatbot produces text about actions.
An AI agent is read-write-and-act. When you give it a goal, the agent takes actions in external systems, observes what those actions returned, and makes its next decision based on that observation. The agent changes external state.
The clearest test: when you ask a chatbot to file a support ticket, it tells you how. When an AI agent gets the same instruction, it files the ticket.
Two confusions worth clearing up before going further.
Chain-of-thought reasoning is not agency. A model that reasons through twelve steps before generating a response is doing impressive cognitive work inside a single call. No autonomous loop, no tool use, no action in the world. Chain-of-thought is a prompting technique, not an agent architecture.
RAG is not an agent either. A RAG (retrieval-augmented generation) system retrieves documents and passes them as context to an LLM in a single pass. Useful, widely deployed, not agentic. RAG becomes part of an agentic AI system only when the model can decide to retrieve, evaluate what it got, and retrieve again with a refined query, adding the iterative observe-and-decide character that defines the autonomous loop. RAG alone is not that.
The Minimum-Component Test: What Must Be Present to Qualify
Understanding what an LLM can and cannot do on its own is where this starts. An LLM is a reasoning engine. It takes inputs and produces outputs. What an LLM cannot do, on its own, is take actions that persist beyond the conversation window. An AI agent is what you build when that limitation matters.
Lilian Weng, in her canonical June 2023 breakdown of LLM-powered autonomous agents, identified four components: planning (decomposing a goal into sub-tasks), memory (short-term in context, long-term in an external store), tool use (calling external APIs, running code, reading files), and action execution (actually doing something in the world). Anthropic’s December 2024 “Building Effective Agents” essay added the minimal framing practitioners actually need: the basic building block is an LLM augmented with retrieval, tools, and memory, embedded in an environment where the agent can take actions.
Here is the minimum-component test in plain language. To qualify as an AI agent, a system must have all four of these:
-
An LLM capable of tool use. The model must be able to call external functions, not just generate text about them. Function calling (OpenAI’s term) or tool use (Anthropic’s term) is the API mechanism: the model returns a structured signal that says “call function X with argument Y,” and the host application executes that call and returns the result.
-
At least one real tool it can invoke. A real tool is a callable function that executes and returns a result to the model. A model generating text about filing a ticket is not a tool call. The application must actually execute the function and hand the output back to the LLM.
-
An autonomous loop where the model directs its own next step. This is the defining criterion. After receiving a tool result, the AI agent decides what to do next, without a human confirming each step. The path through the task is model-directed, not pre-programmed.
-
A stopping condition. A defined end state or step limit. Without a stopping condition, the autonomous loop runs until it hits the API provider’s limits or until someone’s credit card gets a surprise. Hard step limits are required architecture, not an optional safety feature.
The practical operator test from Anthropic’s essay: can the system take an action it was not pre-programmed to take, in response to an intermediate result it could not have predicted before the task started? If yes, the system is operating agentically. If the execution path was fixed in advance by a developer, it is a workflow.
The ReAct Loop: What Agentic AI Architecture Actually Looks Like
The foundational architecture for modern agentic AI systems is called ReAct, introduced by Yao et al. in a 2022 paper at Google and Princeton (arXiv:2210.03629). ReAct stands for Reasoning + Acting. The insight: LLMs generate better actions when they also write a reasoning trace before each action.
The autonomous loop runs: Thought, then Action, then Observation, then Thought again.
Walk through a realistic business task to see what that looks like. Say the AI agent is asked to “summarize competitive pricing for three cloud storage vendors.”
Thought: I need to find current pricing for three major cloud storage providers. I’ll start by searching for the most recent pricing pages.
Action: web_search(“cloud storage pricing 2026 comparison”)
Observation: Results returned five articles. The top three link to actual vendor pricing pages.
Thought: I should read the actual vendor pages rather than the comparison articles, since vendor pages will have current numbers.
Action: fetch_url(“vendor1.com/pricing”)
Observation: Returns a pricing table with three tiers.
Thought: I have pricing for vendor 1. I need the same for vendors 2 and 3 before I can write the summary.
And so on. The AI agent writes its reasoning before each action, which reduces hallucination relative to models that act without explaining themselves, and gives the autonomous loop a self-correction mechanism when a tool returns something unexpected.
On the ALFWorld interactive task benchmark, the ReAct framework outperformed imitation learning by 34 percentage points. On WebShop, by 10 percentage points (Yao et al., 2022). Those benchmarks are why the ReAct loop pattern is the conceptual backbone of every major agent framework today.
The infrastructure that standardizes tool calling across different LLM providers is called MCP: the Model Context Protocol. Anthropic introduced MCP in November 2024 and donated it to the Linux Foundation’s Agentic AI Foundation. OpenAI, Google, and Microsoft adopted MCP by early 2025. Model Context Protocol functions as a USB-C standard for AI apps: instead of each provider having its own proprietary tool-calling format, MCP gives AI agents a universal way to connect to any tool or data source. Frameworks like LangChain and CrewAI abstract the ReAct loop machinery so developers do not wire it manually, but underneath, MCP is increasingly what connects agents to the tools they use. OpenClaw and Hermes are two self-hosted platforms worth knowing on the framework side. The full breakdown of frameworks lives in the agent platform landscape piece coming next.
Agent-Washing: Why Everything Is Being Called an Agent in 2026
Agent-washing is what happens when vendors relabel chatbots, workflows, and scheduled prompt scripts as “AI agents” because the term commands higher perceived value in 2026. The vendor incentive is straightforward: if you define “AI agent” broadly enough that your product fits, you can charge agent prices and make agent claims. Definitional precision is against their commercial interest. Gartner found that of thousands of vendors now calling their product an “AI agent,” approximately 130 are architecturally agentic. That is the scale of the gap.
Here is a label decoder for the marketing language you will encounter:
| What they call it | What it likely is |
|---|---|
| ”AI agent” | Chatbot with a richer UI |
| ”Agentic AI platform” | SaaS wrapper around a foundation model with limited tool access |
| ”Autonomous agent” | Single-prompt script that runs on a schedule |
| ”AI agent suite” | Rule-based workflow with conditional branching |
| Actual agent | LLM + tools + autonomous loop + can recover from an unexpected result |
The gap between agent-washing and real deployment shows up in the numbers. Alvarez & Marsal found in 2025 that less than 10% of organizations have successfully scaled AI agents in any single business function. The PwC 2025 CEO Survey found 44% of business leaders reporting workforce efficiency gains from AI, but only 24% seeing measurable profit impact. That 20-point gap is largely explained by the confusion between AI-assisted execution (a human using an AI tool to work faster) and autonomous action (an AI agent doing the work through an autonomous loop without per-step human direction).
When evaluating any platform with “agent” in the name, run the minimum-component test. Does the model direct its own next step? Does it execute real tools via function calling and receive real results? Is there an autonomous loop? If the answer to any of those is no, the product does not qualify as an AI agent regardless of what it is called.
Workflow vs. Agent: The Line Nobody Draws
Anthropic’s “Building Effective Agents” essay draws this line more cleanly than anyone else has: in a workflow, the sequence of steps is fixed in code. In an AI agent, the LLM decides which step comes next. That binary distinction matters for operator decision-making.
Take the same task run two ways.
Workflow version: Code says step 1, search Google. Step 2, read the top three results. Step 3, summarize. The LLM executes each step competently, but the developer pre-programmed the path. If the top three search results are press releases with no useful pricing data, the workflow still reads them and summarizes, because that is what the code says to do.
Agent version: The AI agent receives “research competitor pricing” and decides which search queries to run, how many results to read, whether the results actually contain pricing information, whether to try a more specific query, and when it has enough data to write the summary. The path is model-directed. If the first search returns unhelpful results, the agent recognizes that and tries again.
This is a binary distinction, not a spectrum. A workflow can use an LLM at every single step and still not be an AI agent if the routing logic is in the code. What matters is who controls the path: the code or the model.
The operator consequence: workflows are predictable, auditable, and cheaper to run. AI agents are flexible and can handle unexpected intermediate states, but they are more expensive, harder to audit, and require more thoughtful scoping before deployment. Well-defined tasks with known paths belong in workflows. Tasks where the right path depends on intermediate results belong in agents.
For a vendor-by-vendor breakdown of agent capabilities across the major platforms, the comparison article covers ChatGPT, Claude, Gemini, and Grok side by side.
Examples That ARE Agents (And What Makes Them Qualify)
Claude Code (Anthropic) is the AI agent I use most at Alameda Internet Marketing. Claude Code runs in a terminal, reads and writes files, executes shell commands, runs test suites, and iterates on failures. When I hand Claude Code a task, it decides which file to open first, runs a command, reads the output, and decides what to do next. The model directs the path. I have watched Claude Code loop through 40-plus steps on a complex content pipeline task: read a brief, write a draft, run a validation script, read the errors, revise the draft, run validation again. Every step was model-directed based on the previous step’s output. That is the autonomous loop operating in practice.
ChatGPT Operator (OpenAI) is available to Pro subscribers at $200 per month. ChatGPT Operator controls a real browser via pixel-level vision, operating mouse and keyboard actions. The model sees the screen, decides what to click, clicks it, observes the result, and decides next. Multi-step actions in web applications without human confirmation at each step. On the OSWorld benchmark for full computer use tasks, ChatGPT Operator achieved a 38.1% success rate. That number means it handles roughly a third of real-world computer tasks autonomously; the other two-thirds still require human involvement. Real agent architecture, real limitations.
Claude Computer Use (Anthropic) applies the same vision-based computer use architecture to desktop and browser environments via the API. A related tool I use at Alameda Internet Marketing is agent-browser: an open-source CLI that drives a browser via a snapshot-action cycle. Agent-browser takes a screenshot, identifies the relevant elements, fills forms, clicks through multi-step flows. I use it for browser automation tasks where no API exists, including WordPress settings changes on sites where platforms block datacenter IP requests.
Gemini Agent Mode (Google) in Android Studio takes a goal described in natural language and formulates an execution plan across multiple project files, then executes that plan. The path is model-directed across the codebase, not a predefined script. The Gemini Enterprise Agent Platform (2026) extends agentic AI to enterprise environments with Agent Studio and a governance registry.
Grok Build (xAI) deserves a mention as the newest entrant. Released in May 2026 at the SuperGrok Heavy tier, Grok Build runs up to eight parallel agents via an agentic CLI. Real agentic capability, though behind OpenAI and Anthropic in ecosystem maturity.
OpenClaw connects AI agents to 24-plus messaging platforms and has 350K GitHub stars. Hermes (Nous Research, February 2026) has a self-improving autonomous loop where the agent creates and refines its own skills from completed tasks. Per the cross-reference in the framework section, the full comparison lives in the upcoming /agent-platform-landscape/ piece.
Examples That Are NOT Agents (Despite the Marketing)
Basic Zapier flows are workflows. The pattern: when an email arrives (trigger), extract data (one LLM call), create a CRM record (action). The sequence is fixed in code. The LLM does not direct its own next step based on what the previous step returned. It executes the step it was told to execute. This is a workflow, and a well-designed one. Zapier Agents (a separate product line) is attempting a genuinely agentic architecture. Context matters when evaluating.
Salesforce Agentforce at its baseline is often a routing tree with an LLM at a single step. A customer inquiry arrives, a rule routes it to the right flow, an LLM generates a response. In many Agentforce implementations, there is no autonomous loop, no AI agent directing its own next step based on what a tool returned. Run the minimum-component test before assigning Agentforce to tasks that require adaptation to unexpected intermediate states. Some Agentforce configurations do implement a genuine autonomous loop; the product is not monolithic. “Run the test” is the right answer, not “assume it qualifies.”
A multi-step prompt chain where you ask the model to “first summarize, then classify, then recommend” in a single prompt is not an AI agent. The model reasons through multiple steps in a single call. No tool executes between them, no external state changes, no autonomous loop. This is a structured prompt, not agentic AI. It can produce excellent results. It is not architecturally an agent.
These are not bad tools. They are the right tools for tasks with predictable paths. They fail the minimum-component test, which is exactly why they also fail at tasks that require the AI agent to adapt to what it found at step three.
Is ChatGPT an AI Agent? The Honest Answer
Base ChatGPT: no. ChatGPT Operator: yes, for the tasks it is built for.
The base ChatGPT assistant takes a prompt and generates text. Even with browsing or code interpreter enabled, a single tool call without an autonomous loop does not qualify. Using one tool and stopping is not an agent loop. Base ChatGPT is a tool-augmented chatbot, useful and distinct from a basic chatbot, but it does not pass the minimum-component test.
ChatGPT Operator is a different product. It clears all four points of the minimum-component test on the tasks it is designed for: it carries a vision model, exposes a set of browser tools, runs in a stopping-bounded loop, and chooses each next click based on screen state rather than a predetermined script.
The honest qualifier: even ChatGPT Operator fails on tasks that require complex judgment or long-horizon planning across many steps. The 38.1% success rate on OSWorld means the other 61.9% of full computer use tasks still require human intervention. ChatGPT Operator is agentic in architecture. It is not unlimited in capability.
One point worth keeping: “Is X an AI agent?” is always a question about a specific configuration and a specific task, not just a product name. A product that is agentic for browser navigation tasks may not be agentic for tasks requiring database write access. The minimum-component test applies to the configuration in question, not the product family.
How We Use Real Agents at AIM (Bridge Section)
I run real AI agents in production. Most of what gets called an AI agent in 2026, even by smart people who should know better, is not one by the definition above.
Here is what actually using an agent looks like at Alameda Internet Marketing.
Claude Code runs on our content pipeline work several times a week. On a typical keyword research task, the autonomous loop will read a brief, pull keyword data via a script, filter against a set of rules, draft section headings, run a validation check for duplicate coverage, revise, recheck. Every decision about what to do next comes from what the previous step returned. Claude Code regularly runs follow-up scripts I had not pre-defined when I wrote the task, because the intermediate results made them the obvious next step. That is the autonomous-decision characteristic of agentic AI in practice.
Agent-browser handles browser automation for client work where no API exists. I use agent-browser for WordPress settings changes on sites hosted on platforms that block datacenter IP requests. The AI agent takes a snapshot of the live admin page, identifies the right fields, fills and submits. For GoLogin profile management tasks, the same pattern: snapshot, identify, act, verify the result.
What I do not use AI agents for: single-query tasks where a direct API call is faster, tasks where a wrong action is irreversible without a human-in-the-loop approval step, and latency-sensitive workflows where waiting for the autonomous loop to complete makes the response time unacceptable. The Anthropic essay’s recommendation is right: start with the simplest tool-augmented call that works. Add loop complexity only when the task actually requires it.
Where Agents Go Wrong (The $47K Problem)
AI agents running in autonomous loops make API calls repeatedly. Each call costs tokens. A stuck or misbehaving agent can run up a large bill before anyone notices.
In 2025, a multi-agent research tool ran for 11 days without anyone checking it. The AI agents were talking to each other, burning tokens in an autonomous loop with no meaningful stopping condition. The final bill was $47,000 (TechStartups, 2025). A separate incident involved a data enrichment AI agent that misread an API error code as a success signal and made 2.3 million API calls over a weekend before someone caught it.
Gartner found in 2025 that only 44% of organizations deploying AI agents have financial guardrails in place. Estimated collective cloud overspend from agent sessions running without per-session cost ceilings hit roughly $400 million across the Fortune 500.
The four risk categories business operators need to understand before deploying agentic AI:
-
Runaway loops. An AI agent that retries indefinitely on a failed tool call, or that does not have a hard stopping condition, will run until it is stopped externally. Hard step limits and cost alerts at the API layer are required architecture.
-
Unintended actions. An AI agent with write access to a system will take write actions. If the goal is underspecified, the agent’s interpretation of “clean up the database” may be more aggressive than intended. Read-only tool permissions wherever possible; write access scoped to the minimum needed.
-
Cascading errors. In multi-agent systems, a wrong intermediate result does not stay wrong in isolation. It passes through the next agent as a ground-truth input. The error compounds. This is one reason the “unnecessary distributed system” anti-pattern is real: five AI agents coordinating adds five compounding failure surfaces.
-
Alignment in action. A chatbot’s hallucination produces a wrong answer. An AI agent’s hallucination produces a wrong action, and wrong actions change external state that may be difficult or impossible to reverse. How hallucination compounds when agents take action is covered in the hallucination piece; the short version is that the stakes are meaningfully higher when agentic AI controls write access.
The mitigations are not complicated: hard step limits per session, human-in-the-loop approval gates for high-impact actions (send, delete, publish), read-only tools where the task allows it, explicit stopping conditions defined before the autonomous loop starts, and cost alerts at the API layer. An AI agent with read-only tools and a 20-step limit is low-risk. Risk climbs sharply once you grant write access without ceiling checks.
Frequently Asked Questions
What does an AI agent do?
From an operator’s perspective: an AI agent receives a goal and breaks it into tool calls and decisions until the task is complete. The agent operates across three categories of actions. It reads data: querying APIs, searching the web, reading files and database records. It writes to systems: updating records, sending messages, creating or modifying files. And it course-corrects when a step fails: if a tool call returns an error or an unexpected result, the AI agent decides what to try next rather than stopping or surfacing the error to a human. The combination of those three capabilities, operating in an autonomous loop without per-step human direction, is what makes an AI agent functionally different from a chatbot.
Is ChatGPT an AI agent?
Base ChatGPT, as a conversational assistant, is not an AI agent. It takes your input, generates a response, and stops. Even with tools like browsing or code interpreter, a single tool call that does not loop back into further model-directed decisions does not meet the minimum-component test.
ChatGPT Operator, available at the Pro tier ($200/month), is a different product. It runs a vision model against a live browser, picks each click from screen state, and continues until a stop condition fires. That is an autonomous loop. The shorthand: same label on the box, different product underneath. A name like “ChatGPT” can be agentic in one configuration and not in another.
Who are the Big 4 AI agents?
The four vendors with the most widely deployed and genuinely agentic implementations in 2026:
OpenAI: ChatGPT Operator controls a browser via pixel-level vision. The Agents SDK and Responses API are the infrastructure layer for developer-built AI agents.
Anthropic: Claude Code operates in a terminal, reading and writing files, running tests, iterating on failures. Claude Computer Use extends vision-based computer use to full desktop environments.
Google: Gemini Agent Mode in Android Studio and the Gemini Enterprise Agent Platform handle multi-step task execution across project files and enterprise environments.
Microsoft: Copilot agents integrate with Office 365 products. The depth of agentic behavior varies by configuration, particularly in the Teams and SharePoint integrations.
Agent capability varies significantly by configuration and task type. Run the minimum-component test against the specific implementation before deploying.
What is agentic AI?
“Agentic AI” describes an architectural property, not a specific product. A system is agentic when it plans, uses tools via function calling, and directs its own next step based on intermediate results. An AI agent is the individual unit that exhibits those properties.
The distinction matters because “agentic AI” is also a marketing adjective that gets applied liberally, which is agent-washing by another name. When a vendor calls their platform “agentic,” ask the property-level question: does the model in this system direct its own tool calls in an autonomous loop, or does the code direct the model’s calls? The answer tells you whether you are looking at an AI agent or a workflow wearing agentic language.
What are the types of AI agents in 2026?
The Russell-Norvig taxonomy from 1994 (simple reflex, model-based, goal-based, utility-based) describes classical AI systems, not LLM-based agents. A more useful 2026 framing:
| Agent type | What it does | Example |
|---|---|---|
| Tool-use agents | LLM + function calling + autonomous loop | Claude Code, OpenAI Agents SDK |
| Computer-use agents | Vision-based control of browser or desktop | ChatGPT Operator, Claude Computer Use |
| Orchestrators | LLMs that delegate sub-tasks to other agents in multi-agent systems | LangGraph orchestration patterns |
| Self-improving agents | Agents that create and refine their own skills from completed tasks | Hermes Agent (Nous Research) |
Most real business deployments are tool-use agents. Computer-use agents are expanding but benchmarks show real limitations (sub-60% on full task success). Orchestrators add coordination complexity that most tasks do not require. Self-improving agents like Hermes represent the newest architectural category and the fastest-growing framework type of 2026.
What is the difference between an AI agent and a chatbot?
One sentence: a chatbot is read-only; an AI agent reads, writes, and acts.
A chatbot generates text. An AI agent executes real functions in external systems via function calling and uses the results to decide its next action. The architectural difference is the autonomous loop: a chatbot’s work ends when it generates a response; an agent’s work continues through as many steps as the task requires.
The consequence difference matters for operators: a chatbot’s mistake is a wrong answer. An AI agent’s mistake can modify real-world state. A chatbot that incorrectly describes how to file a support ticket is an inconvenience. An agent with write access that incorrectly files a support ticket, or deletes one, or sends a message it should not have, changes something that may be difficult to undo. The higher capability comes with a higher cost on the control side.
Are AI agents safe to use in a business?
Yes, when properly scoped. The three controls that make agentic AI safe to deploy:
Read-only tools where possible. An AI agent that can only read data cannot accidentally delete or overwrite it. Scope write access to the minimum required for the task.
Hard step limits. Every agent session needs a defined stopping condition. This is the primary guardrail against runaway loops and unexpected API bills.
Human-in-the-loop approval gates for high-impact actions. Send, delete, publish, pay: any action with irreversible or high-consequence effects should require a human confirmation before the AI agent executes it.
Risk is proportional to scope. An AI agent with read-only tools, a 20-step limit, and human approval for write actions is low-risk and practical. Danger scales with write permissions, absent guardrails, and unbounded autonomous loops. The $47,000 incident was not a product failure. It was a configuration failure: an AI agent with no stopping condition and no cost alert running for 11 days undetected.
Where This Leaves You
If you are evaluating agent tools for real business work and want a practitioner’s take on whether they fit your workflow, that is what we do at Alameda Internet Marketing. We build and run AI agents in production for client work: content pipelines, browser automation via agent-browser, data enrichment, and SEO workflows using Claude Code. We have also watched agents go wrong in the ways described above, which is part of what makes the evaluation useful.
The next piece in this series covers the agent platform landscape: OpenClaw, Hermes, LangChain, CrewAI, and which frameworks actually ship in production.
This is article #5 in the Homme Plus Robot AI guide series. Article #2 covers what an LLM is and cannot do on its own. Article #1 covers ChatGPT vs. Claude vs. Gemini vs. Grok with a vendor-by-vendor breakdown of agent capabilities.
Ross Taylor is the owner of Alameda Internet Marketing, an AI-native SEO and marketing agency. He uses Claude Code and agent-browser in production workflows weekly.