Cloudflare's Code Mode Quietly Rewrote the MCP Contract

There’s a sentence buried in Cloudflare’s April announcement that, if you take it seriously, changes how every team should think about exposing data to AI agents:

“LLMs are better at writing code to call MCP, than at calling MCP directly.”

That’s not a marketing line. It’s an architectural claim — and Cloudflare just shipped the infrastructure to back it up. The “Code Mode” pattern they introduced for Model Context Protocol takes the standard agent loop (“call tool, read result, decide next call”) and replaces it with: generate a TypeScript program that chains calls together, run it in a V8 isolate, return only the final output to the model.

It sounds like a small refactor. It isn’t. If this pattern catches on — and there’s good reason to think it will — the part of your stack that becomes the agent’s bottleneck stops being the MCP tool descriptions and starts being the underlying API contract. Which means a lot of teams who shipped a quick MCP server in Q1 are about to discover that the database layer underneath it was the actual problem the whole time.

The old MCP loop, in one paragraph

The standard MCP execution model looks like this. You expose a tool — list_users(filter, limit). The model decides to call it. The MCP host serializes the call, sends it to the server, gets back a JSON blob, deserializes it, and stuffs the entire response into the model’s context. The model reads it, decides what to do next, calls another tool. Each round-trip burns context-window tokens proportional to the size of the response. If a workflow needs five tool calls, you’ve paid the round-trip tax five times — and the model has to “see” every intermediate result to decide what to do next, even if 99% of those bytes were never going to influence its final answer.

For trivial workflows — “what’s the weather in Berlin” — this is fine. For anything resembling actual data work — joining three tables, filtering on a derived field, aggregating, then formatting the result — it falls apart. The context window fills up with intermediate JSON. The model gets distracted by irrelevant rows. Latency stacks because every step waits on a model inference. By the third or fourth tool call, you’re paying real money to shuffle data through a neural network that doesn’t need to see it.

This is the problem the March-era discussion of MCP’s “context window tax” was already pointing at. Code Mode is the first credible production-scale answer.

What Code Mode actually does

Cloudflare’s pattern, distilled:

Take the MCP server’s tool schemas. Auto-generate a TypeScript API from them — typed function signatures, JSDoc comments, the works. The LLM sees this as an SDK, not as a list of tools.
Ask the model to write a TypeScript program that solves the user’s task by calling those functions.
Run the program in a V8 isolate. Cloudflare’s pitch is that isolates start in milliseconds and use a few megabytes of memory each, so spinning one up per agent invocation is cheap. The sandbox has no internet access — the only way it can reach the outside world is through RPC bindings back to the MCP servers the supervisor configured.
Capture stdout and the final return value. Send those back to the model. The intermediate API responses never enter the model’s context unless the program explicitly logs them.

The credential story matters too. API keys for the underlying MCP servers stay on the supervisor side. The sandbox calls a binding; the supervisor adds the auth header. A model that hallucinates a process.env.DATABASE_PASSWORD reference gets nothing back, because there’s no env to leak in the first place.

The InfoQ writeup of Cloudflare’s reference architecture cites token-usage reductions as high as 99.9% in some workflows. That number deserves an asterisk — it’s a best-case for tool-heavy multi-step jobs, and the original Cloudflare post doesn’t publish a benchmark suite — but the order of magnitude is plausible. If your workflow used to round-trip 50KB of intermediate JSON through the model six times to produce a 200-byte answer, and now the model only sees the 200 bytes, you’re well into “two orders of magnitude” territory.

Why this changes what “good MCP server” means

Here’s the part most teams haven’t worked through yet.

In the old model, an MCP server’s job was to be legible to an LLM. You wrote tool descriptions in friendly prose. You picked names that the model would understand. You returned compact JSON because the model was going to read every byte. The MCP server’s surface area was a UX problem — design for the LLM as the primary consumer.

In the Code Mode world, an MCP server’s job is to be a good SDK target. The model isn’t reading your tool descriptions in a system prompt and pattern-matching them — it’s reading auto-generated TypeScript signatures and writing real code against them. Suddenly the things that matter are the things that have always mattered for SDKs:

Types are real. A function that returns any or Record<string, unknown> forces the model to guess at the shape downstream. A function with a precise return type lets the model chain confidently.
Errors are typed. If the MCP server returns inconsistent error shapes, the generated TypeScript can’t model them, and the model’s program will crash mid-execution with no graceful path.
Operations compose. You don’t want one mega-tool that takes 12 parameters. You want small, orthogonal functions that the model can chain — list, filter, aggregate, format.
Performance is measurable. A V8 isolate that does five sequential RPCs is bottlenecked by the slowest RPC. Latency budgets that were invisible behind a model-thinking pause are suddenly user-visible.

This is precisely the transition that happened to public APIs around 2015 — when the audience shifted from humans reading docs to JavaScript apps consuming JSON, the bar for type consistency, error semantics, and pagination shape went way up. Code Mode is doing the same thing for MCP.

Where databases get caught flat-footed

Most production MCP servers sitting in front of databases right now were built quickly. The shape is usually one of:

A query_database(sql) tool that takes raw SQL. Easy to implement, terrible for Code Mode — the model has to construct SQL strings inside its TypeScript program, which is a worse experience than typed function calls and a worse security story.
A handful of hand-rolled tools (get_user_by_id, list_orders_for_customer) that cover the demo path and break the moment someone asks for something not pre-baked.
A “generic” tool layer wrapping an ORM, returning untyped row dictionaries.

None of these are good Code Mode targets. The first one collapses to “the model is writing SQL inside TypeScript inside a sandbox” — three layers of indirection for a one-layer problem. The second one means the LLM-written program runs out of road on any query the API designer didn’t anticipate. The third one returns Record<string, unknown>[] and forces the model to guess columns.

What you actually want — and this is the bit that’s mechanically obvious once you see Code Mode in action — is:

A typed function per resource, generated from the database schema, with the column types preserved end to end.
A predictable filter grammar — operators, AND/OR, sorting, pagination — that the model can express in TypeScript without dropping to SQL.
Auth scoped to the binding, not embedded in tool arguments, so the sandbox can’t fabricate credentials.
Stable error shapes so try/catch in the generated program does the right thing.

Read that list back. It’s a description of a well-designed REST API auto-generated from a database schema, with RBAC at the boundary. It’s been the right answer to “how do I get data out of Postgres into an application” for ten years. Code Mode is just making it the right answer for AI agents, too.

A concrete example

Suppose the model needs to answer: “Which customers in the EU spent more than €5K last month, broken down by product category?”

Old MCP world, with a query_database(sql) tool:

Model: call query_database("SELECT * FROM customers WHERE region='EU'")
Server: returns 8,000 rows (200KB)
Model: oh no, let me try again
Model: call query_database("SELECT id FROM customers WHERE region='EU'")
Server: returns 8,000 IDs (60KB)
Model: call query_database("SELECT customer_id, SUM(total) FROM orders WHERE customer_id IN (...giant list...) AND created_at > ...")
Server: error, query too long
Model: ...

Three round-trips, two failures, model context full of customer rows it didn’t need.

Code Mode world, with a typed Faucet-style API generated from the schema:

const customers = await api.customers.list({
  filter: { region: { eq: 'EU' } },
  select: ['id'],
});

const totals = await api.orders.aggregate({
  filter: {
    customer_id: { in: customers.map(c => c.id) },
    created_at: { gte: '2026-03-01', lt: '2026-04-01' },
  },
  groupBy: ['customer_id', 'product_category'],
  aggregate: { total: 'sum' },
});

return totals.filter(t => t.total > 5000);

One program, three RPCs, the model only ever sees the final filtered list. The 8,000-row customer dump never enters the context window. The aggregation runs server-side where it belongs. The types flow through — customers is Customer[], c.id is string, the filter argument is checked at generation time.

This isn’t a Faucet ad. It’s just what Code Mode forces. Whatever tool you use to expose your database, it has to look something like this for the pattern to deliver on its promise.

The single-binary angle

There’s a deployment wrinkle worth flagging. Cloudflare’s Code Mode runs the model-written program in a V8 isolate that has zero outbound network access — every RPC goes through a configured binding. That’s a strong security posture, but it means the MCP server on the other end has to be reachable, fast, and operationally simple, because you’re going to run a lot of these.

Heavy MCP servers — the kind that need a sidecar, a queue, a cache, and an external auth service to start — are a poor fit for this world. You want something a sandbox can hit cheaply, that an enterprise team can run in front of every database without standing up a separate platform engineering effort. A single binary that you point at a connection string and that exposes both REST and MCP in the same process is, frankly, what this pattern was waiting for. The fact that the binary embeds RBAC means you don’t have to glue together three services just to ensure the sandbox calls run with the right user’s permissions.

This is exactly the bet Faucet made: one Go binary, multi-database, REST + MCP from the same schema introspection, RBAC at the boundary. None of that was designed with Code Mode in mind — Code Mode wasn’t public yet — but the architecture lands in the right place by accident, because both decisions were following the same logic about where the agent industry is going.

What to do this week

If you have an MCP server in production right now, three things are worth checking:

Are your tool return types specific? If your tools return generic JSON or untyped rows, Code Mode generation will produce TypeScript that the model can’t reason about. Add Zod schemas, JSON Schema, or whatever your stack supports — but make the types real.
Is your auth on the binding, or in the tool args? If the model has to pass an API key as a tool argument, you’re one prompt injection away from a leak. Move credentials to the supervisor / binding layer where the sandbox can’t see them.
Are your tools chainable? If your server exposes one giant do_the_thing(payload) tool, no amount of Code Mode will save you. Break it into composable pieces. The model is good at writing code; give it primitives to compose.

The teams that did the boring work of building good database APIs over the last decade are about to look prescient. The teams that shipped a quick MCP wrapper to ride the agent hype are about to find out that the wrapper was hiding a worse problem underneath.

Getting Started

Faucet auto-generates a typed REST API and an MCP server from your database schema in one binary, with RBAC at the boundary — exactly the shape Code Mode wants on the other end of an RPC binding.

Install:

curl -fsSL https://get.faucet.dev | sh

Point it at a database:

faucet serve --db postgres://user:pass@localhost/mydb

You get REST endpoints and an MCP server on the same port, both backed by the same schema introspection and the same RBAC policy. When Cloudflare Code Mode generates a TypeScript SDK from the MCP side, it’s reading the same types your REST clients see. One source of truth, one binary to deploy, one auth boundary to reason about.

Docs: faucet.dev · Source: github.com/faucetdb/faucet