The Model Context Protocol adds tokens to every request. This is not a flaw. It is the cost of the abstraction. A protocol that makes tools discoverable, composable, and observable by necessity adds structure. That structure consumes tokens. The question is whether the tokens are buying something.
On a homelab or AWS Free Tier budget, the answer matters more than it does for well-funded production systems. Lambda charges for execution duration. Token counts drive inference costs. A session that consumes 1,500 extra tokens on every invocation is not a concern at 100 sessions/month. It is a line item worth examining at 100,000 sessions/month.
Where the tokens go
An MCP session has three token overhead sources: the server description, the tool schemas, and the per-call invocation format. They are additive and they all hit the input token count.
Server description is the natural language description of what the MCP server does, included in the context when the server is initialised. A well-written server description is 200-400 tokens. A verbose one with examples and caveats can reach 600. This is a one-time cost per session, not per call.
Tool schemas are the JSON Schema definitions of each tool: its name, description, parameter names, types, and descriptions. A minimal tool schema (name, description, 2 parameters) is around 50 tokens. A well-documented tool with 5 parameters and descriptive field descriptions is 100-150 tokens. Multiply by the number of registered tools.
Invocation format is the structured representation of each tool call and its result. The call itself (tool name, parameters as JSON) adds 20-40 tokens. The result wrapper adds another 10-20. On a session with 10 tool calls, that's 300-600 tokens of invocation overhead.
# Token overhead breakdown for a 10-tool MCP server, 10-call session
server_description_tokens = 300 # one-time, per session
tool_schemas_tokens = 100 # average per tool
tools_registered = 10
tool_schema_total = tool_schemas_tokens * tools_registered # 1,000
invocation_per_call_tokens = 30 # average per tool call
calls_in_session = 10
invocation_total = invocation_per_call_tokens * calls_in_session # 300
session_overhead_total = (
server_description_tokens + # 300
tool_schema_total + # 1,000
invocation_total # 300
)
# session_overhead_total = 1,600 tokens
# At $0.80 / 1M input tokens (small model tier):
cost_per_session_overhead = (session_overhead_total / 1_000_000) * 0.80
# = $0.00128 per session in overhead alone
# At 1,000 sessions / month:
monthly_overhead_cost = cost_per_session_overhead * 1_000
# = $1.28 / month
$1.28/month sounds trivial. It is trivial for most use cases. The point is not that MCP is expensive. The point is that the overhead is predictable and measurable, and that it scales linearly with tool count and session volume. Understanding it lets you make deliberate decisions about tool count, schema verbosity, and whether the abstraction is earning its cost for a given workload.
The break-even calculation
MCP overhead is not purely a cost to minimise. The schema overhead buys discoverability: the model knows what tools exist, what they accept, and how to call them correctly without bespoke prompt engineering. That's worth tokens if the tools are used repeatedly.
The break-even point is where the structure overhead is justified by the reduction in error rate and re-tries. A direct API call approach that requires 3 attempts to get the correct structured output may cost more tokens in total than an MCP call that gets it right on the first attempt because the schema is explicit.
The relevant calculation is not "MCP overhead vs zero" but "MCP overhead vs the tokens spent on error correction and retry in a direct approach."
When MCP overhead is justified
Three conditions make MCP overhead worthwhile on a constrained budget:
Tools are reused across sessions. The schema overhead is paid once per session, not once per tool call. If a session makes 10 calls to the same tool set, the overhead per call is 150 tokens (15% of the 1,000 token schema cost). If a session makes 1 call and ends, the overhead per call is 1,000 tokens (the full schema cost). The amortisation only works if tools are exercised.
Tool observability is a requirement. MCP tool calls are structured, logged, and inspectable. If you need an audit trail of what tools were called, with what parameters, and what they returned, that observability is built into the protocol. Building equivalent observability into direct API calls requires additional instrumentation. The token overhead may be cheaper than the engineering overhead.
The tool schema provides genuine structure the model uses. A well-written tool schema reduces ambiguity. The model knows the exact parameter names, types, and constraints. On tasks where structured output is required (tool calls that produce machine-readable results consumed by downstream steps), schema precision reduces error rates. The reduction in retries can offset the schema overhead.
When direct API calls are preferable
Three conditions favour direct API calls:
Single-purpose scripts. A script that does one thing, called once, does not benefit from tool discoverability. The MCP overhead adds tokens without providing structure that the model needs to navigate. Write the function directly, call it directly, pay only for the tokens you actually need.
One-shot tasks. If a session makes exactly one tool call and terminates, the amortisation argument fails. The schema overhead for 10 tools is present regardless of which tool is called. For one-shot tasks with a known tool, a direct API call is cheaper and simpler.
You own the entire pipeline. MCP's discoverability value is highest when the client (the model) and the server (the tool provider) are developed independently and need a contract. If you control both sides of the interface, the contract can be simpler. A direct function call with typed parameters achieves the same result with less overhead.
# Direct approach: controlled pipeline, typed interface
async def run_scan_task(repo_url: str, scan_depth: int = 3) -> ScanResult:
"""Call GitHub API and analysis functions directly."""
repo = await github_client.get_repo(repo_url)
files = await github_client.list_files(repo, depth=scan_depth)
analysis = await analyser.run(files)
return ScanResult(repo=repo, findings=analysis.findings)
# MCP approach: tool exposed for model-driven invocation
# Adds: server description tokens + schema tokens + invocation format tokens
# Justified when: model decides which tools to call, observability required,
# tool reused across many sessions
Reducing overhead without removing MCP
If MCP is the right choice for a workload but overhead needs to be managed, three levers are available:
Reduce tool count per server. Register only the tools a session actually needs. A session that only writes files doesn't need the read tools in its schema. Selective tool registration reduces schema overhead proportionally: a 5-tool server adds ~650 tokens instead of ~1,300 for a 10-tool server.
Tighten tool descriptions. Tool descriptions are the most verbose part of a schema. A description that is 3 sentences instead of 6 saves 50 tokens per tool. Across 10 tools, that's 500 tokens per session. Write descriptions that are precise, not thorough.
Cache schema context. For a hosted API that supports prompt caching (Anthropic's cache_control feature), the tool schemas can be marked as cacheable. The first request pays full token cost. Subsequent requests within the cache window pay a reduced rate or zero for the cached portion. Input cache hits typically cost 10% of the standard input token rate.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeticketyboo brings governed AI development to your pull request workflow. 5 governance runs free, one-time welcome grant. No card required.
View pricing Start free →