servicesaboutinsights
Back to all notes

#8 — A Survey of Techniques for Maximizing LLM Performance

November 30, 20232 min read

#8 — A Survey of Techniques for Maximizing LLM Performance
Get notified of new Field Notes

Get actionable go-to-market insights delivered weekly. No fluff, no spam, just strategy that works.

Why it matters: Optimizing large language models (LLMs) is critical for startups building AI products, but the path isn't linear. Understanding when to use different techniques can save significant time and resources.

The big picture: There are two key dimensions to optimize: context (what the model needs to know) and behavior (how the model needs to act).

The Optimization Framework

Start with prompt engineering:

  • Begin with clear instructions, breaking complex tasks into simpler subtasks
  • Give the model "time to think" through frameworks like ReAct
  • Add few-shot examples to show rather than tell the model what you want
  • Create consistent evaluation metrics before making changes

When to use RAG (Retrieval-Augmented Generation):

  • When you need to introduce new information or domain-specific content
  • To reduce hallucinations by controlling what content the model can access
  • When you need to update the model's knowledge without retraining

When to use fine-tuning:

  • When you need consistent instruction following or specific output formats
  • To emphasize knowledge already in the base model
  • To modify the structure, tone, or style of outputs
  • When you need to teach complex instructions beyond what fits in a prompt

Success metrics

For RAG evaluation, consider:

  • Faithfulness: Are facts in the answer supported by retrieved content?
  • Answer relevancy: Does the answer address the original question?
  • Context precision: How much of the retrieved content was actually useful?
  • Context recall: Is the relevant information needed to answer present?

For fine-tuning success:

  • Start small with high-quality data rather than large quantities
  • Establish baselines with prompt engineering before fine-tuning
  • Evaluate models against expert human judgment or using more powerful models

By the numbers

In a real-world SQL generation benchmark:

  • Basic prompt engineering: 69% accuracy
  • Adding few-shot examples: +2%
  • Simple RAG with question-based retrieval: +3%
  • Advanced RAG with hypothetical answer retrieval: +5%
  • Fine-tuning with simple prompt engineering: +13%
  • Fine-tuning + RAG: +14.5%

Bottom line

The most effective approach often combines techniques. Start with prompt engineering, analyze error types, then add RAG for knowledge problems or fine-tuning for instruction/format problems. Remember that high-quality data trumps quantity, and the process is rarely linear—expect to iterate between techniques.

More than just words

We’re actually here to help. Your ICPs have the words. We find them.

Strategic messaging isn't marketing fluff—it's the difference between burning cash on ads or sales efforts that don't convert and building a growth engine that scales.