When I set out to create a conversational bot that truly “knows” the Indian Constitution, I thought I’d be wiring up a few API calls. Instead, I discovered that turning a dense legal PDF into a production-grade RAG system is a journey of regex tweaks, hybrid retrievers, chain orchestration, and prompt finesse.
From Numbered Headings to “Article N.” Clarity
The original PDF had headings like:
1. Fundamental Rights…
2. Citizenship…
19. Freedom of Speech…
A query for “Article 19” returned mixed results, because raw numerals don’t map neatly in vector space. To make legal references explicit, I applied a lightweight regex pass:

This simple transformation helped the retriever anchor on “Article 19” more reliably, reducing ambiguous hits
Embeddings Meet Keyword Search: A Hybrid Retriever
I leveraged Legal-BERT embeddings stored in ChromaDB for semantic recall—but pure vector search sometimes surfaced nearby articles instead of the exact one . To balance precision and recall, I combined it with a classic BM25 retriever:

This ensemble secured both conceptual matches and exact keyword hits—essential for handling legal jargon and references.
Cutting Through Noise with Contextual Compression
More results can mean more noise. To streamline output, I wrapped the ensemble in a ContextualCompressionRetriever

This layer asks the LLM to distill the top candidates into the most salient passages—no more 10-paragraph dumps, just concise, relevant excerpts
Chain Magic: Wrapping It All Up
Rather than juggling multiple steps in my Flask app, I organized the workflow into reusable chains:
Runnable Retrieval Chain Orchestrates document load → regex normalization → ensemble retrieval → compression. A single .run() call yields the final context.
History-Aware Retriever Chain By persisting chat history via FileChatMessageHistory and feeding it into create_history_aware_retriever, the bot handles follow-ups naturally:
User: What does Article 21 guarantee?
User: And which landmark cases cite it?
The second query retains context, so users don’t have to repeat “Article 21” each time
3. QA Generation Chain With context in hand, this chain prompts the LLM to produce answers using:
### headings
bold key phrases
bullet lists
The result is a study-guide style response, not just a paragraph of legalese
Considering Metadata Enrichment
I explored extracting the PDF’s table of contents into a metadata index—injecting article summaries directly into queries. While that approach holds promise for perfect numeral-to-article mapping, the initial implementations suggested it could be an excellent future enhancement once a consistently structured TOC is available
Future Optimizations on the Horizon
MultiQueryRetriever: Generate paraphrased queries (“rights under Article 19,” “speech freedoms”) and merge results to boost recall.
Dynamic Retriever Weighting: Adapt semantic vs. keyword weights based on query length or confidence.
Legal Synonym Injection: Enrich queries with terms like “expression” or “press freedom.”
Client-Side Caching: Store frequent queries in localStorage for sub-second responses.
TOC(Table of content)-Driven Metadata Enrichment: Once I secure a clean index, reintroduce metadata injections for exact matches.
Wrapping Up
Building this RAG bot taught me that real-world RAG ≠ plug-and-play. It’s an iterative dance of preprocessing, hybrid retrieval strategies, compression, chain orchestration, and targeted prompting. By sharing these “how”s and “why”s—from my regex rescue to the history-aware chains—I hope to demonstrate not only technical skill, but also the engineering judgment required to turn prototypes into robust, user-friendly systems.