What an MCP Server Does in an AI Workflow
MCP servers let you write one integration that any AI agent can call. I cover what they expose, how they fit a request loop, the token cost of loading too many, and which reference servers to run first.
An MCP server is a small program that exposes your tools, data, and actions to an AI model through one standard protocol, so any compatible agent can call them without custom glue code. MCP stands for Model Context Protocol. Write the integration once and every MCP client can use it.
Why does an AI agent need an MCP server at all?
Before MCP I maintained four separate scripts. One shelled out to the GitHub CLI. One held a Postgres connection. One read the local filesystem. One posted to Slack. Each spoke a different dialect, each broke in a different way, and when I swapped the underlying model none of them carried over cleanly. That is the N times M problem. Every model client multiplied by every data source is a connector somebody has to build and keep alive.
MCP collapses that. The protocol launched in November 2024 and the growth was not subtle. Server downloads went from around 100,000 that month to more than 8 million by April 2025. By late 2025 there were over 10,000 active public servers. You write one MCP server for your database and Claude, Cursor, VS Code, and ChatGPT can all call it.
What is an MCP server, exactly?
An MCP server is a process that speaks JSON-RPC 2.0 and advertises three kinds of things to a client:
- Tools: actions the model can invoke, like
run_queryorcreate_issue. - Resources: data the model can read, like a file or a table schema.
- Prompts: reusable templates the server offers to the client.
The client is the app holding the model, like Claude Desktop or Claude Code. It connects to the server, asks what it has, and gets back a list of typed tools with JSON schemas. The model sees those tools and decides which to call. The server runs the actual code and returns a result.
Two transports matter. stdio runs the server as a local subprocess and pipes messages over standard in and out. Good for local files and anything on your machine. Streamable HTTP runs the server as a remote service you reach over the network, which is how hosted servers from Stripe, a SaaS vendor, or your own deployment work.
How does an MCP server fit into an AI workflow?
Here is the loop for a single request. The client starts the servers and pulls their tool lists into the model context. The user asks something. The model picks a tool and emits a structured call. The client routes that call to the right server. The server executes, hits your database or API, and returns JSON. The model reads the result and either answers or calls another tool.
Configuration is a JSON file. This is a real Claude Desktop config running two stdio servers:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/jordan/projects"]
},
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/app"]
}
}
}
Restart the client and the model can read files under that directory and run read-only queries against that database. No SDK calls, no per-model rewrite.
Which MCP servers are worth running first?
Start with the reference servers maintained alongside the spec. They are boring and they work.
| Server | What it exposes | Transport | Good first use |
|---|---|---|---|
| filesystem | read/write inside allowed dirs | stdio | local code and docs |
| github | repos, issues, PRs, code search | stdio / HTTP | code review agents |
| postgres | schema plus read-only queries | stdio | data questions |
| slack | channels, messages, search | HTTP | status and notifications |
A note on order: turn servers on one at a time. I have watched an agent get measurably worse when handed a pile of tools it had no reason to touch.
When does an MCP server slow your workflow down?
Every tool a server exposes ships its full JSON schema into the model context before the user types anything. The GitHub server alone exposes more than 30 tools. In my logs a single schema runs roughly 200 to 700 tokens depending on how many parameters it carries. Load three or four chatty servers and you can spend 6,000 to 10,000 tokens describing capabilities the model may never use. That is context you paid for and latency you wait on, every single turn.
The fastest way to make an agent worse is to give it forty tools it does not need. Context spent listing capabilities is context not spent on the task.
Remote servers add network cost on top. A stdio server answers in single-digit milliseconds. A Streamable HTTP server in another region can add 100 to 400 ms per call, and an agent that chains six tool calls feels that six times. Auth is the other tax. Remote servers need OAuth or tokens, and the 2025-11-25 spec tightened that, including requiring servers to reject invalid Origin headers with HTTP 403. The fix is selection. Run the two or three servers a workflow actually needs, scope their permissions tight, and kill the rest.
What changed with MCP in 2025 and 2026?
It stopped being an Anthropic-only thing. OpenAI adopted MCP across its Agents SDK, Responses API, and ChatGPT desktop in March 2025. Google confirmed Gemini support in April 2025. In December 2025 Anthropic donated the protocol to the new Agentic AI Foundation under the Linux Foundation, with OpenAI and Block as co-founders. The current stable spec is the 2025-11-25 revision, which added async tasks and cleaner OAuth.
For you that means a server you write today is a long-term asset. It is not tied to one vendor's model lineup.
Is an MCP server the same as an API?
No. An API is the underlying interface to a service. An MCP server is a standardized wrapper that presents one or more APIs, files, or a database to an AI model in a shape the model can discover and call on its own. Many MCP servers are thin adapters sitting in front of an existing REST API.
Do I need to build my own MCP server?
Not to start. For files, GitHub, Postgres, Slack, Stripe, and a few hundred other targets a server already exists. Build your own when you have a proprietary internal system or want to expose a narrow, safe slice of one. The TypeScript and Python SDKs make a basic server a few dozen lines.
How much context does an MCP server consume?
It depends on tool count and schema size. Budget roughly 200 to 700 tokens per tool definition. A single focused server costs a few hundred to a couple thousand tokens. Stacking many broad servers can run past 10,000 tokens before any work happens, which is the main reason to keep your active set small.
Pick one workflow you run by hand today. Map it to the two reference servers that cover it, drop them into your client config, and scope their access to exactly what that task needs. Measure the token cost of the tool list before and after. That number tells you whether your set is lean or bloated.
Tired of re-keying the same data between tools? Pylonworks builds custom automation and internal tools for businesses without a developer, on a fixed quote you approve up front. Tell us what's eating your time