#115 — Code Mode: Your MCP Doesn’t Need 30 Tools

September 27, 2025•6 min read

Get exclusive Field Notes

Get actionable go-to-market insights delivered weekly. No fluff, no spam, just essentials.

Table of Contents

Get exclusive Field Notes

Get actionable go-to-market insights delivered weekly. No fluff, no spam, just essentials.

Why this matters: Most AI agents are limping—they interact with your APIs via clunky wrappers and brittle tool-call mechanics, slowing your product’s ability to automate real work. As your product or ops scale, this becomes slow, error-prone, and hard to trust. You need an agent that thinks and iterates like your dev team—able to string together logic, scale, and handle complexity.

The Unlock: Switch to Code Mode, where your LLM writes TypeScript code that calls APIs directly, using the Model Context Protocol (MCP) behind the scenes.

The Modern Approach: Code-First, MCP-Backed

Don’t just string together tool calls. Give your agent the ability to write, review, and run TypeScript or Python code, leveraging MCP to expose APIs in a uniform, secure, documented way.

Agents are better at coding than at “tool calling”
- Present agents with dozens or hundreds of APIs—complex, powerful, real. You don’t have to neuter or simplify for the sake of agent tool-call limitations.
- LLMs are trained on real code, not ad-hoc, synthetic tool-call syntax. Give them the SDK surface—they perform like seasoned devs.
Cut “context rot”—compose and automate with code
- Traditional approach: every tool call burns context, needs inference.
- Code Mode: agent generates code, executes logic, branches, and loops in one shot—saving context, accelerating results.
Uniformity = speed
- MCP means every API is described—and discoverable—in the same standardized way, making onboarding new services, agents, or internal teams seamless and fast.

What’s New & Why It Matters

Composable, repeatable, trustable automation:
- Code generation lets you review, modify, and reuse agent workflows. Your team can audit, diff, and upgrade processes safely.
- Scripts written by agents can be checkpointed and replayed without always punting decisions back to the neural net.
Stateful and robust session management:
- Allow agents to use session state, enabling multi-step reasoning, debugging, and cross-step learning. Instead of exposing dozens of tools, sometimes it’s better to expose a language interpreter—agents get more done, session state persists.
CLI tools and shell pitfalls:
- Beware brittle, platform-specific CLI wrappers. Prefer language interpreters inside MCP for true cross-platform automation.
Scale: One API, Many Runs
- Let your agent write and rerun code; repeat automation hundreds of times reliably.
Security: Sandboxing, key safety, and syscall control
- All execution happens inside isolated Workers; agent never sees API keys.
- Layer policy and syscall interception for true safety—sandbox interpreters tightly.
Bulk tool filtering and context management
- The more APIs you expose, the more context risk. Prefer code-first surfaces: agent already knows how to introspect with help(), dir(), or SDK docs, and doesn’t need to rediscover every possible tool.
Dynamic and self-exploring agents
- Agents discover available APIs through code introspection and protocol-level docs, allowing you to build smarter, more adaptive automation.

Old Way vs. Code Mode Comparison

Feature	Traditional Tool Calling	Code Mode (Recommended)
Agent fluency	Low (synthetic tokens)	High (TypeScript, native code)
Scale to many APIs	Brittle/combinatorial	Unlocked, no simplify needed
Multi-step workflows	Token-intensive, slow	Single code file, fast, efficient
API surface complexity	Heavily simplified	Present full developer surface
State, sessions	Stateless	Persistent scripting state
Audit/debug/reuse	Difficult	Easy script review/diff/replay
Dynamic discovery	None/manual	Introspection, protocol docs
Security	Manual, may leak keys	Isolated sandbox, no key leaks
Documentation access	Often external/manual	In-protocol, attached, uniform

Step-by-Step Implementation Guide

Run or connect an MCP server MCP wraps any REST/RPC API and exposes tools in a uniform specification with doc comments for LLMs.
Choose your Agents SDK Use Cloudflare’s Agents SDK (ai-sdk) or another MCP-compatible framework. You can add codemode as a helper to your agent’s code.
Convert MCP tools to a TypeScript API The SDK will fetch the MCP spec/schema and auto-generate TypeScript interfaces (with documentation) for each exposed tool.
Configure sandboxed agent execution Your agent’s generated TypeScript code runs in a secure Worker sandbox:
- No general internet access.
- Access limited to service bindings you specify (no API keys or wider network).
- Millisecond spin-up/teardown, virtually zero memory overhead.
Orchestration and workflow The LLM is prompted to solve problems by writing code using only the API surface you’ve exposed (not arbitrary fetches or unsafe calls). Code can:
- Call multiple tools in logical order
- Parse and act on responses in real time
- Manage errors, retries, branches, and parallel execution—all inside one context
Output and logging Results are returned via console.log or designated output handlers in the Worker. Logs are streamed back to the calling agent for further processing or user display.
Security best practices By default, no raw credentials reach the agent. Bindings provide already-authorized access scoped tightly to what’s needed. This defeats the most common LLM risk: accidental credential or data leakage.
No more containers Execution is done via V8 isolates (Cloudflare Workers), not heavyweight containers—each job is ephemeral, isolated by design, and costs far less than server-based alternatives.

Reference implementations:

Additional Nuances and Tactical Tips

Audit scripts regularly; checkpoint agent flows.
Use session-based interpreters for complex tasks.
Prompt LLMs to discover API structure via code (introspection).
Invest early in sandboxing and syscall controls.
Use logic interpreters for bulk, repeatable operations.
Favor agent code output over raw tool calls for trust, reuse, and review.

Bottom Line Code Mode with MCP and Cloudflare Workers gives you an agent that acts like a SaaS engineer: orchestrating, scaling, and iterating with real APIs, at a fraction of the cost and risk. Ship faster, automate more, and sleep better knowing your keys and business logic are safe.

If you want example scripts, deeper CLI/SDK patterns, or security checklists—just ask!

P.S. This pattern is similar to that advocated by Huggingface.

Frequently asked questions

What is Code Mode and why does it outperform traditional MCP tool calls for AI agents?

Code Mode lets your AI agent write and execute real TypeScript or Python code directly, instead of making brittle, context-hogging tool calls. LLMs are trained on actual code, so their workflows are more robust, auditable, and scalable. In real startup use, founders have automated SaaS onboarding and lead routing 10x faster by letting agents generate actionable code instead of juggling 20+ API wrappers.

How do I migrate from dozens of MCP tools to a code-first interpreter model?

Consolidate your API surface by exposing a language interpreter—like Python or TypeScript—as a single, well-documented tool in your MCP server. This lets agents compose, repeat, and debug logic in scripts, instead of flipping between siloed wrappers. Example: A MarTech startup replaced 35 CRM/sales API endpoints with one Python interpreter, reducing maintenance by 80% and increasing automation quality.

Is Code Mode secure? What prevents AI agents from leaking secrets or misusing APIs?

Yes, security is built-in. All agent-generated code runs in isolated, disposable sandboxes (Cloudflare Worker V8 isolates). API keys are never exposed—agents only interact with pre-authorized bindings. A fintech firm used this architecture to automate KYC checks and data enrichment safely, keeping all keys away from agent context and passing three external security audits.

Can Code Mode handle multi-step business processes better than regular tool calls?

Absolutely. With Code Mode, agents can plan, orchestrate, and execute complex workflows (loops, branches, error handling) in a single script. For example, a travel startup enables its bot to fetch weather data, compare prices, book hotels, and notify users—all inside one session, rather than burning tokens and latency across dozens of discrete calls.

How do session management and stateful flows work with Code Mode?

Code Mode agents maintain session state, so they can solve multi-step tasks, debug processes, and learn over iterations. For instance, an e-commerce brand uses session state to track promotion eligibility, update inventory, and sync with logistics—eliminating the need to re-invoke siloed tools for each workflow step.

What problems can arise from relying on CLI tool wrappers, and why is code generation better?

CLI tools can have platform dependencies, version drift, and fragile encoding—leading to silent failures and poor reproducibility. By contrast, code-first agent flows allow consistent scripting and easier error tracking. One SaaS company migrated from CLI shell orchestration to agent-generated Python scripts, seeing a 60% reduction in customer support tickets for automation errors.

Will exposing a code interpreter in MCP introduce new risks?

Every interpreter must be tightly sandboxed, with syscall interception and policy restrictions. Unfiltered agent code can attempt risky operations if not contained. However, startups using Cloudflare Workers or similar serverless isolates have safely automated everything from image processing to workflow scheduling—with no privilege escalation or breach events reported.

Can Code Mode agents learn new APIs and adapt dynamically?

Yes. With interpreter access, agents can self-explore using docs, introspection, and live API feedback—auto-discovering new endpoints and adapting logic in real time. For example, an HR SaaS platform deployed agents that learned to onboard integrations for new payroll providers overnight with zero human intervention.

Is Code Mode repeatable and scalable for bulk operations?

Definitely. Agents write reusable scripts and can replay them hundreds or thousands of times, optimizing for speed and reliability. Case in point: a lead gen agency leveraged code-mode automation to parse, score, and route millions of leads every month—saving six figures in manual salaries.

Should early-stage startups implement Code Mode or stick to tool-call architectures?

Code Mode is ideal for startups seeking speed, scale, and robust automation. It slashes engineering overhead and delivers founder-level agility. Early adopters routinely ship features faster—one Seed-stage SaaS went from prototype to market-ready onboarding bot in under a week using agentic code flows and MCP.

What are the biggest misconceptions about MCP and Code Mode for AI automation?

Many founders think MCP must be used strictly for tool calling, but Code Mode transforms MCP into a developer-friendly, programmable API surface. Real automation comes from exposing language interpreters to agents, not relying on dozens of brittle tool wrappers. This unlocks repeatability, auditability, and more reliable scale for startups.

Does switching to Code Mode reduce the cost of deploying agentic workflows?

Yes. Cloudflare Workers and MCP's code-first approach are 5-10x more cost-efficient than container-based sandboxes. Startup QA teams have saved thousands per month by replacing repeated LLM inferences with reusable Python scripts executed in V8 isolates—avoiding context rot and expensive tool orchestration overhead.

How do I audit agent-generated scripts to ensure reliability and compliance?

Auditing is straightforward: when agents generate code (rather than opaque tool calls), you can run static and dynamic analysis, diff changes, and spot-check outputs. One e-commerce company used this approach to validate every inventory sync, catching edge cases before deployment and massively reducing incident rates.

Is Code Mode compatible with all major LLMs, or do I need specialized models?

It works with all LLMs that are strong at code generation (GPT-4, Claude, Gemini, etc). No proprietary fine-tuning needed. Many SaaS teams report agents seamlessly moving from OpenAI models to open source like Qwen or Llama, with code-based flows needing only minor prompt tweaks.

What kind of business processes are best suited to code-first agent automation?

Bulk, repetitive operations where reliability and auditability matter—think lead enrichment, document conversion, onboarding flows, and real-time reporting. For example, a fintech startup automated 85% of nightly compliance checks via code-mode agents, freeing up three full-time analysts.

How does sandboxing work for agentic code, and can it prevent privilege escalation?

Code Mode typically runs scripts in isolated, short-lived V8 sandboxes (Cloudflare Workers). These isolates block internet access, restrict system calls, and only allow predefined API bindings. Security audits in financial and healthcare sectors have validated this model for safe automation—no successful privilege escalation reported when configured correctly.

What real-world use cases exist for agentic code-mode workflows?

Dozens! SaaS onboarding bots, lead parsing agents, image processing pipelines, real-time Slack responders—all are running in startups and large organizations today. Case study: An HR tech platform used a Python interpreter via MCP to automate onboarding flows, cutting manual time by 90% and reducing errors to near zero.

How does Code Mode help fight context rot and tool selection overload in agents?

Presenting a code interpreter dramatically reduces context bloat. Agents no longer need to infer from 30+ tool permutations—instead, they use the programming language's introspection to discover available functions and SDK docs as needed, cutting confusion and error rates.

Are there agentic code vulnerabilities I should know about?

Yes: code-mode is powerful, but prompt injection, unsandboxed eval, or unfiltered bindings still pose risks. Always use isolated Workers, sanity-check API bindings, and intercept system calls or sensitive commands. Security-conscious startups rotate sandboxes per task and only expose minimal privileges.

What is the future of MCP and agentic automation for startups?

Expect smarter agents that dynamically learn new APIs, leverage session state, and explain their automated code flows in plain English to users. Startups will move from one-off tool wrapping to full code orchestration—shipping features, integrations, and bots radically faster and more safely than ever before.

Keep reading

#116 — Coca-Cola’s 70/20/10 Formula

Allocates resources across different types of marketing activities and content creation in a way to optimize what works, pilot what’s promising, and bet on the future.

#117 — Creating a content strategy for your startup

Create a winning content strategy that actually drives growth and turn content into a compounding growth engine.

#118 — The Diamond Framework for startup naming

The name is your startup’s first impression—for users, investors, future hires. Use this framework to avoid fuzzy brainstorming.

View more →