#6 — Tuning and Optimizing Workflows
November 15, 2023•2 min read

Breaking it down: Multi-turn workflows deliver massive wins
Founders take note: Complex AI tasks aren't solved with single prompts. The real magic happens when you break them down into strategic steps.
AlphaCodium's approach proves this—they boosted GPT-4's coding accuracy from 19% to 44% by implementing a methodical workflow: problem analysis → test case reasoning → solution generation → solution ranking → synthetic testing → iterative refinement.
Why this matters: Your engineering team can achieve similar breakthroughs by adopting structured approaches that interface cleanly with your existing systems.
THE RELIABILITY IMPERATIVE
Deterministic > Non-deterministic (for now)
Smart founders are prioritizing deterministic workflows. Each non-deterministic agent action introduces failure risk—multiply that across steps and reliability plummets.
Instead, generate and execute plans deterministically. This approach:
- Creates reusable few-shot examples
- Simplifies testing and debugging
- Makes failure analysis straightforward
- Produces DAGs that are easier to comprehend
Pro tip: Treat AI agents like junior engineers—provide clear objectives and concrete execution plans.
BEYOND TEMPERATURE: STRATEGIC DIVERSITY
Temperature adjustments alone won't deliver the output variety your product needs. Savvy founders are implementing:
- Strategic prompt element shuffling
- Output tracking to prevent redundancy
- Prompt phrasing variations
For example, a recommendation engine can shuffle historical user data and vary prompt construction to dramatically increase suggestion diversity.
THE CACHING ADVANTAGE
Underutilized opportunity alert: Caching delivers multiple competitive advantages:
- Immediate cost reduction
- Zero generation latency
- Risk mitigation through pre-vetted responses
Implementation strategy:
- Utilize unique IDs for processed items
- Normalize user inputs with autocomplete and spelling correction to maximize cache hits
WHEN TO MAKE THE FINETUNING LEAP
Even brilliantly engineered prompts sometimes fall short. Successful founders like those behind Honeycomb's NLQ Assistant and Rechat's Lucy made strategic decisions to finetune when standard prompting couldn't deliver reliable, high-quality outputs.
Cost considerations: Finetuning requires significant investment in data annotation, model training, evaluation, and potentially self-hosting. Mitigate this by generating synthetic training data or bootstrapping with open-source datasets.
The founders who master these implementation strategies will build more capable, reliable AI products while controlling costs—creating sustainable competitive advantage in today's AI-driven landscape.
Keep reading

#7 — Working with models
AI startups face critical decisions on LLM integration, migration, versioning, and sizing that can determine success or failure.

#8 — A Survey of Techniques for Maximizing LLM Performance
A practical guide to optimizing LLM performance.

#9 — Copywriting formulas
11 proven copywriting formulas that help founders craft compelling messages, drive engagement, and convert followers.