RAG System Design
TL;DR
Building a RAG system in banking: how to choose your chunk strategy, embedding model, and retrieval pipeline. Lessons from real production experience.
The Reality of RAG
Retrieval-Augmented Generation (RAG) sounds simple in a tutorial: embed your PDFs, throw them in a vector database, and perform cosine similarity search. In production—especially in banking—this naive approach fails immediately.
Financial documents are dense. They contain tables, footnotes, hierarchical clauses, and cross-references. A simple vector search will retrieve the wrong clause 50% of the time.
3 Key Architectural Decisions
- Semantic Chunking over Fixed Chunking: Never chunk by “every 500 tokens.” Financial documents must be chunked by semantic boundaries (e.g., stopping at the end of a regulatory clause). If you split a paragraph describing interest rate conditions down the middle, the embedding loses all meaning.
- Hybrid Retrieval (Vector + Keyword): Vector search is terrible at finding specific alphanumeric codes (like SEC filing numbers or specific policy IDs). You must implement a hybrid pipeline combining BM25 (keyword search) with dense vector embeddings.
- Re-ranking is Mandatory: Retrieving top 20 candidate chunks introduces too much noise. You need a dedicated Cross-Encoder re-ranking model to score the relevance of the retrieved chunks against the original user query before passing the top 5 to the LLM.
If your RAG system is hallucinating, 9 times out of 10, the problem is in the retrieval pipeline, not the language model.
💬 Read more: 2025 Year in Review (English)
Related Articles
Harness Engineering: Building the Execution Layer for Your AI Agent
Harness Engineering is the execution layer in AI Agent architecture. This post introduces the core design of a Harness: execution control, observability, hooks, tool sandboxing, and state management.
5 Product Design Traps When Building AI Agents
AI Agents sound cool, but building Agent products in enterprise is full of pitfalls. Here are five design traps I've experienced firsthand.