Why your AI is getting dumber — and what the industry is doing about it

Last week I was working with our client New Amsterdam, helping troubleshoot an issue with Copilot and the Azure CLI, when an interesting topic came up that I couldn’t stop thinking about.

It’s something that affects anyone who builds with AI, and increasingly, anyone who just uses it. Some of the connectors and integrations you’re relying on every day are quietly eating into your AI’s working memory, and the more of them you add, the worse your results get. Your AI is getting dumber the more you stuff into its context window.

The context window: the one concept you need for this to make sense

Every time you use an AI tool, it doesn’t remember previous conversations the way a colleague would. Instead, it reads the entire current conversation from scratch before it responds, every message, every instruction, every piece of background. That reading happens inside what’s called a context window: the total amount of text the model can hold in view at once.

Think of it as a desk. Everything the AI needs to work with has to fit on that desk. Once it’s full, something has to come off before anything new goes on.

Modern models have large desks, capable of holding hundreds of thousands of words. That feels enormous until you understand what’s competing for the space. When an AI tool connects to other software, your CRM, your codebase, a database, it needs to be told what those tools can do before it can use them. Every integration comes with a description: here’s what I do, here are my inputs, here’s what I return. Those descriptions are text. They go on the desk.

Less desk space means shallower reasoning, more errors on complex tasks, and higher costs.

This isn’t just happening in my client work

On March 11th at Perplexity’s inaugural developer conference, their CTO Denis Yarats announced the company was moving away from MCP, Model Context Protocol is the standard that lets AI tools connect to external software, in favour of traditional APIs and CLIs. Notable because Perplexity had shipped their own MCP server just months earlier. Garry Tan, president of Y Combinator, followed up the same day: “MCP sucks honestly.” The context window overhead, the authentication friction, the hidden costs. All of it.

The frustration has a name: the schema injection problem. When an AI agent connects to an MCP server, it receives the full JSON schema for every tool that server exposes, every name, description, parameter, type, and response format, loaded into the context window upfront, on every turn, whether those tools get used or not. It’s a feature, not a bug. It’s how MCP was designed to work. The model needs this information to know what’s available. But the cost is significant.

Some concrete numbers from production deployments:

GitHub’s MCP server: ~50,000 tokens just to initialize
A database MCP server with 106 tools: 54,600 tokens before a single query runs
Cloudflare’s analysis of complex agent setups: up to 81% of available context consumed by tool descriptions alone

A typical CLI interaction costs around 200 tokens. The same operation routed through MCP can cost 200 times that before the work even begins.

Why MCP exists anyway

The properties that create the overhead are the same properties that make it useful in certain contexts.

MCP’s explicit schemas mean any compliant client can discover what any compliant server can do, without prior knowledge. That’s genuinely useful for building systems where the agent needs to figure out its own capabilities at runtime, multi-tenant platforms, enterprise deployments with dynamic toolsets, environments where tool availability changes based on user permissions.

The structured authentication model means every tool call is traceable. You get audit trails, access controls, and a consistent security posture across integrations. That matters in regulated industries or anywhere you need to know exactly what your AI touched and when.

A CLI works when you know exactly what tools you need. MCP works when the agent needs to figure that out at runtime. For production pipelines with a known, stable tool set, the choice is becoming obvious.

So what does this mean for you?

If you’re using AI tools day to day, Copilot, Claude, a custom agent setup, you probably won’t see a settings panel that says “context window: 40% consumed by tool descriptions.” The overhead is invisible. The effect isn’t.

If your AI assistant feels less sharp in longer sessions, or starts making mistakes on tasks it handled fine earlier in the conversation, a crowded desk is a likely contributor. The more integrations you’ve connected, the worse this gets.

A few practical things worth acting on now:

Audit what you’ve connected. Most AI tools let you see which integrations or MCP servers are active. If you are using Slipstream’s Claude.ai go to connectors > Tool Access and make sure you enable “Load tools when needed.”

Treat integrations like tabs in a browser. The ones you don’t close cost you something. Be deliberate about what’s active in your working environment, especially for focused, high-stakes tasks.

If you’re building, the architecture decision is real. Direct API and CLI integrations are significantly cheaper in token terms for known, stable tool sets. Save MCP for the scenarios it’s designed for — dynamic tool discovery, multi-tenant systems, anywhere runtime governance matters more than efficiency.

Watch this space. Right now the gap between MCP’s promise and its production reality is wide enough that the CTO of Perplexity said so on stage, and the president of Y Combinator agreed publicly the same afternoon.

What started as a client question about the Azure CLI last week turns out to be a leading indicator of where the whole industry is headed. The tools that win in the agent era won’t just be the most capable ones. They’ll be the ones that earn their place on the desk.

— Matt