“We need to fine-tune the model on our data” is one of the most common — and most often wrong — instincts when teams start building AI features. Most of the time, what they actually want is RAG (Retrieval-Augmented Generation), which is cheaper, faster to ship, and easier to keep current.
Here’s how to tell which one you actually need.

RAG: give the model the right documents at query time
RAG keeps the model frozen and changes what you put in front of it. When a question comes in, you search a vector database for the most relevant chunks of your data, stuff them into the prompt, and ask the model to answer using them.
The model never changes. Your knowledge lives in a database you control. Update a document, and the next query reflects it instantly — no retraining.
- Fresh data: add/update/delete documents anytime.
- Citations: you know which document the answer came from.
- Cheap to update: re-embed the changed doc; done.
- Access control: filter retrieval by the user’s permissions.
Fine-tuning: change the model’s weights
Fine-tuning retrains the model on thousands of examples of your data, baking new behavior into its weights. The output is a new model that behaves differently.
- Good for style/format: “always respond in this tone, this JSON shape.”
- Good for narrow tasks: classification, extraction with consistent rules.
- Stale fast: the moment your data changes, the model is out of date.
- Expensive: data prep + training runs + eval + re-deploy.
The decision rule
Ask: is the goal to teach the model FACTS, or to teach it BEHAVIOR?
- Facts(“answer questions about our product docs”, “cite our policies”, “know our latest pricing”) → RAG. Facts change; you don’t want them baked into weights.
- Behavior(“always extract these 8 fields into this exact format”, “respond in our brand voice”, “classify into these 12 categories”) → consider fine-tuning, but try prompting first.
Most business use cases are about facts, which is why most teams want RAG even when they ask for fine-tuning.
Try this order
- Prompting first.A good system prompt + few-shot examples solves a shocking number of “we need to fine-tune” cases. Free to iterate.
- RAG second. If the model needs to know YOUR facts, add retrieval. This covers ~80% of business AI features.
- Fine-tuning third.Only when prompting + RAG can’t hit your quality bar on a consistent behavior/format task, AND you have the examples, AND the behavior is stable.
The hybrid that’s often best
For mature products: fine-tune for behavior (consistent format, brand voice) AND use RAG for facts (current data). The fine-tuned model handles the “how to respond,” RAG handles the “what to say.” But this is an optimization for products with real volume — not where you start.
Common RAG mistakes
- Chunking badly. Too-big chunks dilute relevance; too-small chunks lose context. Start at ~500 tokens with overlap; tune from there.
- Retrieving too many chunks. Top-20 chunks costs tokens and confuses the model. Top-3 to top-5 is usually right.
- No re-ranking. Vector similarity isn’t relevance. Add a re-ranker for a big quality jump.
- Ignoring access control. Filter retrieval by what the user is allowed to see, or you’ll leak data across tenants.
How we approach this
For AI features we build via AI Software Development, we default to prompting, then RAG. We’ve fine-tuned for a handful of clients with stable, high-volume format tasks — but the vast majority of business value comes from a well-built RAG pipeline with proper chunking, re-ranking and access control.
Takeaways
- Facts → RAG. Behavior → maybe fine-tune (but try prompting first).
- RAG: fresh data, citations, cheap updates, access control.
- Fine-tuning: style/format, narrow tasks, stale fast, expensive.
- Order: prompting → RAG → fine-tuning. Most teams stop at RAG.







