Production RAG Assistant Over a 12-Year Knowledge Base

The challenge

The client ran a mature B2B SaaS platform with twelve years of accumulated knowledge: help articles, release notes, deprecated docs, thousands of resolved support tickets and a sprawling internal wiki. New customers and new support agents drowned in it. The same questions were answered repeatedly, often inconsistently, because the right answer was buried three doc versions deep.

They had tried a generic chatbot. It hallucinated confidently, cited nothing, and was quietly switched off after it told a customer about a feature that had been removed two years earlier. The bar for a second attempt was high: correctness, citations, and the humility to say "I don't know."

Our approach

We treated this as a retrieval problem first and a language-model problem second. A RAG system is only as good as what it retrieves, and twelve years of content included a great deal that was actively wrong by now.

The first month was unglamorous: building a content pipeline that classified every source by freshness and authority, excluded superseded documentation, and tagged content by product area and version. We built an evaluation set of real questions with known-correct answers before we wrote the answering layer — so we could measure quality instead of guessing at it.

What we built

A curated ingestion pipeline. Sources were normalised, chunked semantically rather than by fixed length, embedded, and stored in pgvector alongside metadata for filtering by product, version and recency.
A retrieval layer with guardrails. Every query retrieved candidates, re-ranked them, and discarded anything below a confidence threshold. If nothing cleared the bar, the system said so rather than improvising.
A cited answering layer. The Claude API generated answers constrained to retrieved context, with inline citations linking back to the exact source. Agents could verify in one click.
An evaluation harness. Every change ran against the question set, so regressions were caught before deploy, not by customers.

Technologies & stack

A Python ingestion and retrieval service, pgvector for embeddings and metadata filtering, the Claude API for answer synthesis with strict grounding, and a Next.js interface embedded both in the customer help centre and the internal agent console. LangChain handled orchestration; the evaluation harness was custom.

Outcomes

42% of tier-1 support tickets were deflected with answers customers accepted — measured over the first quarter post-launch.
Average new-agent ramp time fell by about a third, because the assistant became their fastest path to a verified answer.
Trust held because the system cited everything and refused to guess. The "I don't know" path, counter-intuitively, was the feature that made it credible.

Related case studies

The same retrieval discipline informed our healthcare oversight work, where grounding AI strictly in source-of-truth records was a regulatory necessity, not just a quality choice.

Production RAG Assistant Over a 12-Year Knowledge Base

The challenge

Our approach

What we built

Technologies & stack

Outcomes

Related case studies

Technologies & stack

Related case studies

B2B Payments and Reconciliation Platform

Clinician Oversight Platform for an FDA-cleared CPAP Product

Have a project in mind?