#8 — A Survey of Techniques for Maximizing LLM Performance
November 30, 2023•2 min read

Get actionable go-to-market insights delivered weekly. No fluff, no spam, just strategy that works.
Get actionable go-to-market insights delivered weekly. No fluff, no spam, just strategy that works.
Why it matters: Optimizing large language models (LLMs) is critical for startups building AI products, but the path isn't linear. Understanding when to use different techniques can save significant time and resources.
The big picture: There are two key dimensions to optimize: context (what the model needs to know) and behavior (how the model needs to act).
The Optimization Framework
Start with prompt engineering:
- Begin with clear instructions, breaking complex tasks into simpler subtasks
- Give the model "time to think" through frameworks like ReAct
- Add few-shot examples to show rather than tell the model what you want
- Create consistent evaluation metrics before making changes
When to use RAG (Retrieval-Augmented Generation):
- When you need to introduce new information or domain-specific content
- To reduce hallucinations by controlling what content the model can access
- When you need to update the model's knowledge without retraining
When to use fine-tuning:
- When you need consistent instruction following or specific output formats
- To emphasize knowledge already in the base model
- To modify the structure, tone, or style of outputs
- When you need to teach complex instructions beyond what fits in a prompt
Success metrics
For RAG evaluation, consider:
- Faithfulness: Are facts in the answer supported by retrieved content?
- Answer relevancy: Does the answer address the original question?
- Context precision: How much of the retrieved content was actually useful?
- Context recall: Is the relevant information needed to answer present?
For fine-tuning success:
- Start small with high-quality data rather than large quantities
- Establish baselines with prompt engineering before fine-tuning
- Evaluate models against expert human judgment or using more powerful models
By the numbers
In a real-world SQL generation benchmark:
- Basic prompt engineering: 69% accuracy
- Adding few-shot examples: +2%
- Simple RAG with question-based retrieval: +3%
- Advanced RAG with hypothetical answer retrieval: +5%
- Fine-tuning with simple prompt engineering: +13%
- Fine-tuning + RAG: +14.5%
Bottom line
The most effective approach often combines techniques. Start with prompt engineering, analyze error types, then add RAG for knowledge problems or fine-tuning for instruction/format problems. Remember that high-quality data trumps quantity, and the process is rarely linear—expect to iterate between techniques.
Keep reading

#9 — Copywriting formulas
11 proven copywriting formulas that help founders craft compelling messages, drive engagement, and convert followers.

#10 — The three-hour Brand Sprint
Google Ventures' Brand Sprint condenses essential branding exercises into a three-hour workshop.

#11 — Managing a board of directors
Board management isn't just governance. Here's how top founders maintain transparency, build trust, and channel investor expertise to accelerate growth.