← Blog · Jul 16, 2026 · AI

RAG in practice: when to use it, when not, and how to measure quality

RAG is the default answer to "I want a chatbot that knows my documents." And often it's the right one. But it's also where most projects get stuck: it's built by guesswork, never measured, and nobody knows why it sometimes answers badly. This is the practical, no-hype version of when to use RAG, how to build it and how to know if it works. It's the natural follow-up to LLMs in production.

1. What RAG is (and isn't)

RAG = retrieve first, generate second. For each question, you search your documents for the relevant chunks and pass them to the model as context, so it answers with your information instead of what it "remembers." It's not magic and it's not retraining the model: it's a search engine in front of a generator. If retrieval fails, the answer fails, no matter how good the model is.

2. When YES and when NO

When YES
  • Large or changing knowledge base (docs, tickets, manuals).
  • Answers that must cite the source.
  • Information that updates and you don't want to retrain anything.
  • Support, internal search, assistants over documentation.
When NO
  • The knowledge fits entirely in the prompt (few stable docs).
  • The task needs no external data (classify, translate, rewrite).
  • You need an exact value: a database query is better.
  • Precise keyword search: sometimes full-text search is enough.

Short rule: RAG adds infrastructure (indexing, embeddings, retrieval). If you don't need it, it's free complexity.

3. How it's built, piece by piece

4. How to measure quality (what almost nobody does)

Without measurement, RAG is faith. Always separate two layers:

To measure you need an evaluation set: 30-100 real questions with their expected answer/source. With it you run the evaluation before and after each change (a different embedding model, a different chunk size, adding reranking) and you know whether you improved or regressed. Without that set, every tweak is blind.

5. Common mistakes we see

FAQ

What is RAG in one sentence?

Giving the model, for each question, the relevant chunks of your documents so it answers with your information. Retrieve first, generate second.

When is it NOT worth it?

When knowledge fits in the prompt, the task needs no external data, or you need an exact database value. RAG adds infrastructure.

How do you measure it?

Separately: retrieval (recall@k, precision@k) and generation (faithfulness, relevance), with a set of real questions and their expected source.

Do you need a dedicated vector DB?

Not always. pgvector on Postgres is enough for small/medium volumes; dedicated ones make sense with millions of vectors or complex filters.

Want an assistant that truly knows your documents?

We design RAG systems with hybrid retrieval, evaluation and cost control. Fixed price by milestones.

AI & chatbots service   Request a quote

Related resources

Published: July 16, 2026 · Written by the RoviDev studio.

Request a quote, no commitment

Tell me briefly about your project and I usually reply in under 30 minutes with feasibility, phases and a price range.

or email contacto@rovidev.com