RAG System Design
The Reality of RAG
Retrieval-Augmented Generation (RAG) sounds simple in a tutorial: embed your PDFs, throw them in a vector database, and perform cosine similarity search. In production—especially in banking—this naive approach fails immediately.
Financial documents are dense. They contain tables, footnotes, hierarchical clauses, and cross-references. A simple vector search will retrieve the wrong clause 50% of the time.
3 Key Architectural Decisions
- Semantic Chunking over Fixed Chunking: Never chunk by “every 500 tokens.” Financial documents must be chunked by semantic boundaries (e.g., stopping at the end of a regulatory clause). If you split a paragraph describing interest rate conditions down the middle, the embedding loses all meaning.
- Hybrid Retrieval (Vector + Keyword): Vector search is terrible at finding specific alphanumeric codes (like SEC filing numbers or specific policy IDs). You must implement a hybrid pipeline combining BM25 (keyword search) with dense vector embeddings.
- Re-ranking is Mandatory: Retrieving top 20 candidate chunks introduces too much noise. You need a dedicated Cross-Encoder re-ranking model to score the relevance of the retrieved chunks against the original user query before passing the top 5 to the LLM.
If your RAG system is hallucinating, 9 times out of 10, the problem is in the retrieval pipeline, not the language model.
💬 Read more: 2025 Year in Review (English)
Harness Engineering: Building the Execution Layer for Your AI Agent
Harness Engineering is the execution layer in AI Agent architecture. This post introduces the core design of a Harness: execution control, observability, hooks, tool sandboxing, and state management.
5 Product Design Traps When Building AI Agents
AI Agents sound cool, but building Agent products in enterprise is full of pitfalls. Here are five design traps I've experienced firsthand.
Harness Engineering: Building the Execution Layer for Your AI Agent
Harness Engineering is the execution layer in AI Agent architecture. This post introduces the core design of a Harness: execution control, observability, hooks, tool sandboxing, and state management.
5 Product Design Traps When Building AI Agents
AI Agents sound cool, but building Agent products in enterprise is full of pitfalls. Here are five design traps I've experienced firsthand.