#119 — Context engineering
October 6, 2025•4 min read

Get actionable go-to-market insights delivered weekly. No fluff, no spam, just essentials.
Get actionable go-to-market insights delivered weekly. No fluff, no spam, just essentials.
🚩 Why Bother?
LLMs are mature enough to power decision engines, automate workflows, and interact with users at scale—but only if you engineer their context deliberately.
1. Fundamentals: The Context Window
- Context Window: Every LLM has a fixed “memory” (context window) of tokens it can process—typically <100K tokens. This is the entire universe the model “sees” during each inference.
- Old prompt engineering relied on trial, error, and clever wording to nudge outputs. Today, every token counts. Your context must be intentional, dynamic, and structured.
2. Evolution: From Prompts to Context
- Early LLMs excelled at completion: you’d start a sentence, the model would finish it using its training data.
- Chat framing introduced speaker tokens—system instructions, chat history, and roles—enabling complex conversations (telling the LLM it’s a “film critic,” for example).
- Prompt engineering is giving way to Context Engineering: managing every element of what you send to the model (instructions, examples, docs, user signals, external tools).
3. Context Engineering Tools & Content
An example scenario: build a context window that completely briefs your LLM analyst. Every type of info belongs:
- Domain documents: Instructions, how-to guides, specs, user manuals
- Recent data: Live stats, ratings, market figures (via retrieval)
- User preferences and history: Recency, familiarity, exclusions, personalization
- External tools/functions: Calculators, databases, APIs
- Examples & memories: Past answers, historical patterns, conversation history
- Multimodal input: Images, audio, video—converted to tokens when needed
Avoid overstuffing; a bloated context reduces clarity and increases hallucinations. Relevance and selectivity matter.
4. Key Design Patterns for Context Engineering
RAG (Retrieval-Augmented Generation)
- Fetch up-to-date data, reviews, ratings—inject into context for decisions that need fresh input.
Tool Calling
- Let the LLM request calculations or data via explicit tokens or API calls; pass results back to the model for synthesis.
Structured Output
- Lock outputs to strict formats (JSON, XML, custom schema) for reliable downstream use.
Chain of Thought / ReAct
- Ask the LLM to “show its work” step by step—reducing hallucination, improving traceability, and clarifying logic before the final output.
Memory + Context Compression
- Store salient facts and summarize long histories to maintain continuity and context purity (not every past message matters).
Agent Orchestration
- Split tasks among agents: retriever, safety, critic, preference, interaction. Pass outputs as context/API handoffs between agents.
5. From Oracle to Analyst: Mindset Shift
Stop treating the LLM like a magic 8-ball. Start briefing it like a junior analyst:
- Supply all key facts, instructions, and tools
- Frame tasks precisely: what, why, how, and expected outputs
- Don’t trust stale training data—inject live results and explicit tools
- Document how intermediary results, agent outputs, and user inputs flow downstream
6. Compositional, API-like Context
- Build complex systems as composed agents: interaction, retrieval, safety, reasoning, synthesis.
- Handoff between agents = API contract. Define what each agent needs, what it returns, and the context sequence to use.
7. Workflow Checklist
- Define the problem: What does your LLM need to solve?
- Design context flow: Collect all facts, tools, memories, and format the context.
- Pick design patterns: RAG, tool calling, structured output, memory, chain of thought—choose what fits.
- Engineer agent interfaces: Document token handoffs/API contracts between agents.
- Test context rigorously: Debug, iterate, measure outputs; watch for context overflow and ambiguity.
- Update and compress memory: Ensure continuity without diluting relevance.
- Document everything: Treat context flows as core software—not throwaway code.
💡 Final Takeaway: Great context engineering is the difference between unreliable LLM experiments and systems that drive business value at scale. Treat context flow, agent contracts, and token selection with engineering discipline—not prompt magic. Own the entire process as you would any API or module. The founders who master this will deploy LLMs that deliver usable, auditable, extensible results—not just demo chatbot answers.
Frequently asked questions
What is context engineering and why does it matter for LLM-powered startups?
Context engineering is the discipline of architecting every bit of information—documents, data, instructions, recent events—fed to a large language model (LLM) during a session. Rather than merely 'prompt engineering,' it ensures models generate reliable, auditable, and business-ready results. Smart startups use context engineering to treat LLMs like analysts, not oracles, significantly improving accuracy, traceability, and user trust. For example, Instacart's product search uses context engineering to blend live shopping data, product specs, and user preferences for personalized results.
How can founders prevent LLMs from making costly mistakes or hallucinations?
LLMs are prone to hallucinations if given ambiguous or incomplete contexts. Rigorous context engineering—injecting up-to-date retrievals, limiting irrelevant noise, enforcing chain-of-thought reasoning, and specifying output formats—dramatically reduces error rates. In legal tech, DoNotPay built an 'AI lawyer' that cross-references case law with user queries, filtering context tightly to avoid risky mistakes during real-world document generation.
What are the best practices for integrating Retrieval-Augmented Generation (RAG) into an LLM workflow?
Retrieval-Augmented Generation (RAG) pipelines fetch live external data—docs, stats, ratings—to inject as context before LLM inference, ensuring responses always reflect the most recent truth. For instance, Hugging Face’s documentation QA bots dynamically pull fresh docs for every query, giving developers up-to-date answers beyond the model’s training window. Founders should index all key documentation, set up scalable retrievers, and validate injection mechanisms for consistency and relevance.
How do I design context handoffs and APIs between multiple LLM agents?
In multi-agent systems, each agent (retriever, reasoner, critic, safety, output) expects and returns data in precise formats—just like APIs. Define strict contracts for every handoff (JSON schemas, input/output shapes), document protocols, and test with real scenarios. Modular architectures power products like Modular’s AI assistant, where retrievers fetch data, critics assess quality, and synthesizers generate user-facing answers in seamless, composable flows.
Why is structured output critical for LLM integrations in production systems?
Unstructured responses break integrations and risk unpredictable results. Specifying structured output—like strict JSON formats—let founders parse results, trigger downstream automations, and safely process user requests at scale. Klarna’s customer service agent outputs only valid XML to interface with their backend CRM, cutting incident rates and eliminating costly surprises.
How can founders leverage context engineering to personalize products without extensive user tracking?
Context engineering enables precise personalization using relevant, non-invasive signals: session history, location, recent interactions, and public preferences, sliced and injected as needed. For example, Shopify’s product recommendation bots personalize shopping journeys using only session recency and cart data—no persistent tracking or personal info—proving context can drive personalization ethically and effectively.
What problems can arise from overstuffed context windows, and how do I avoid them?
Overstuffed context windows can dilute relevance, confuse the model, and trigger token limits—leading to failures and performance drops. Use memory compression, relevance scoring, and selective injection to keep context lean and focused. Case study: A financial planning startup slashed context payloads by 68%, resulting in 2.5x faster response times and improved auditability for compliance reviews.
Are there tools or platforms that automate parts of context engineering for startups?
Yes, frameworks like LangChain, Semantic Kernel, and DSPy offer primitives for RAG, memory management, output formatting, and tool calling. Founders can scaffold context-driven agents, define workflows, and run production-grade experiments with minimal boilerplate. For example, SmartAgent.ai built a pipeline with LangChain to automate 30% of customer queries—entirely through automated context construction and agent orchestration.
How do successful startups use chain-of-thought reasoning to boost LLM decision reliability?
Chain-of-thought prompts ask LLMs to explicitly reason through every step before giving answers, exposing intermediate logic and boosting overall reliability. In health tech, Ada Health’s symptom checker requires LLMs to show every reasoning path, allowing doctors to audit AI decisions and catch errors midstream—improving patient safety and regulatory compliance.
What are the operational steps to transition from prompt engineering to true context engineering?
First, map out every info source, API, and tool needed for your target task. Next, architect the context window—deciding which docs, data, tools, and memories to inject. Define and document agent handoff contracts, enforce structured output, and iterate with live user sessions, measuring for accuracy and traceability. Real-world example: A SaaS founder replaced ad-hoc prompts with a modular context builder, cutting deploy-time errors by 50% and unlocking scalable customer support from day one.
How can context engineering improve the ROI of LLM-driven products for startups?
Context engineering lets founders cut wasted API calls, reduce error rates, and increase product reliability by focusing model attention on only what matters. Companies like Jasper.ai improved their content automation ROI by hyper-optimizing context payloads—resulting in lower costs and better output quality for end users.
What are the top context engineering mistakes founders should avoid?
Common pitfalls include overstuffing context windows, skipping relevance scoring, ignoring structured outputs, and failing to document agent interfaces. For instance, an edtech startup saw dropout rates spike when context windows were packed with irrelevant course summaries. Fixes included relevance weighting and lean memory injection—restoring product usability and search ranking.
Is context engineering compatible with low-latency and real-time user experiences?
Yes. By compressing memory and optimizing document retrieval, you maintain fast response times. Case: Zapier’s internal bots use context engineering to route only pertinent event histories—enabling sub-second replies while keeping outputs personalized and reliable.
How do context engineering and retrieval-augmented generation affect SEO for AI-driven SaaS?
Great context engineering surfaces up-to-date responses, accurate product data, and precise reasoning—all boosting trust and click-through rates on landing pages. SaaS companies using RAG see lower bounce rates; search engines reward clear, accurate, and relevant output—leading to better rankings and user retention.
How to balance privacy and context engineering in compliance-heavy industries?
Limit injected data to session-level or anonymized events, removing PII before context construction. Fintech and healthcare AI providers use context engineering to comply with GDPR and HIPAA by designing privacy-preserving contexts—unlocking advanced personalization without exposing users to compliance risks.
What KPIs can founders track to measure context engineering impact on their LLM workflows?
Key metrics include inference accuracy, token efficiency (tokens per useful fact), completion latency, error/incident rates, and downstream automation success. Example: A customer support startup tracked 'context relevance' and saw NPS scores double after context engineering reduced hallucinations and improved solution rates.
Can context engineering help LLMs handle complex multi-turn, multi-party conversations?
Absolutely. By slicing context into agent roles (retriever, critic, operator), you handle negotiation, escalation, and multi-user support. For example, Intercom’s AI support stack allocates context slices to route users, resolve disputes, and escalate issues—all within strict context windows for optimal accuracy.
Are there automated ways to optimize context for different user segments or products?
Yes. Dynamic context builders use rules and relevance models to assemble context per session, tailoring complexity to high-value users or critical tasks. B2B SaaS platforms like Salesforce automate context assembly for tiers of users—executive buyers get business summaries, while admins receive granular setup guides.
How does context engineering enable safer deployment of LLMs in regulated markets?
It provides an audit trail for every inference—logging injected documents, tool outputs, user signals, and agent handoffs. Regulated startups (insurance, legal, healthcare) use this for traceability, issue review, and compliance reporting, dramatically reducing legal risk and improving regulator trust.
What open source tools support real-world context engineering best practices?
Top picks: LangChain for workflow orchestration, DSPy for prompt/model optimization, Semantic Kernel for agent design, and LlamaIndex for efficient document retrieval. In production, Notion automated 50% of its AI search results using LangChain-powered dynamic context retrieval and modular agents.
Keep reading

#120 — Choosing distance metrics for vectors in AI
Searching for an exact match in a database is relatively easy, but finding a similar match (or even defining "similar") is much harder.

#121 — Kimi K2 Thinking: Model card for founders
Kimi K2 Thinking is a breakthrough open-source agent designed for step-by-step tool-using problem solving at scale.

#122 — How Railway scaled high-touch to support to 1000s of developers
If your best customers don’t want to talk to sales, make support the bridge to revenue.