In my previous blog, I introduced Retrieval Augmented Generation (RAG) and explained why it is such an important concept for building AI systems that rely on private, domain‑specific, or rapidly changing information.
What I did not cover was that there are different types of retrievers you can use in a RAG system, and each one trades off simplicity, intelligence, and complexity.
In this blog, I share how I built my first RAG retriever using a simple keyword‑based approach. It is intentionally basic and not semantic, but it works, and it taught me the foundations of retrieval, which is exactly what I needed at this stage. If you are following a similar path, this post should help you start small, stay focused, and progress in a safe but meaningful way.
If you want to try the assistant as you read, you can experiment with the live ADR RAG application here: try the ADR assistant.
Why talk about retrieval types
When people hear “RAG”, the discussion often jumps directly to vector databases and embeddings. That is one level of retrieval, but it is not the only one, and it is rarely the first one you need as an architect learning the RAG. As I started building my own RAG system, I realised there are different levels of retrieval, each bringing more intelligence but also greater complexity.
Here is a simplified view of that spectrum:
Level 1 – Keyword or syntax‑based retrieval
This is the most basic form: match the user’s words to words in your documents. It is fast, predictable, and easy to implement, very similar to classic search engines or Lucene‑style search.
Level 2 – Semantic retrieval (embeddings)
Instead of matching words literally, you match the meaning of the text. This captures synonyms, relationships, and contextual similarity, and usually relies on vector databases and embedding models.
Level 3 – Hybrid retrieval
A combination of both. Exact keyword matches remain valuable, but semantic similarity improves depth and accuracy and often becomes the default for production‑grade RAG systems.
More advanced setups add re‑ranking (often using an LLM), metadata‑based weighting, and feedback loops, but those can come later in the journey. For my first real RAG implementation, I intentionally started with the simplest form: keyword retrieval, so that I could build something functional and understand how retrieval interacts with generation before adding the complexity of embeddings and vector databases.
My use case: an architecture decision assistant
The product I am exploring is an ADR (Architecture Decision Record) assistant. The idea is straightforward: the user describes their context, problem, and intended technology stack, and the system retrieves the most relevant architecture patterns and sends them to an LLM. The LLM then generates architecture options grounded in those patterns, making the LLM’s answer more focused.
In this application:
- Architecture patterns serve as the knowledge base.
- The retriever decides which patterns are most relevant to the current problem.
- The LLM generates structured architectural recommendations based on those retrieved patterns.
How my keyword retriever works
The keyword retriever follows a simple process that helps me understand retrieval in practice without jumping straight into semantic search. It takes the user’s information as input, combines it into search terms, and then scores patterns based on how well they match.
Step 1: Collect user input
The system receives three main inputs from the user:
- A free text description of the context.
- A statement of the primary problem.
- A list of technologies the user plans to use.
These inputs provide the signals needed to retrieve relevant patterns and mimic how architects actually describe their problems in real life.
Step 2: Build search terms
The retriever combines the context, problem, and selected technologies into a single lowercase string. This combined string serves as the basis for pattern matching and remains deliberately simple, without advanced NLP or semantic understanding at this stage.
Step 3: Score patterns based on matches
For each pattern in the knowledge base, the retriever examines three fields:
- The pattern name.
- Keywords associated with the pattern.
- The pattern description.
Each pattern is scored based on matches found in these fields. Name matches are weighted highest, followed by keyword matches, with description matches contributing the least.
Step 4: Rank and return the top patterns
Once all patterns are scored, the retriever sorts them by relevance and returns the top 5 patterns as the retrieved context for the LLM. These patterns enrich the prompt, enabling the LLM to generate options grounded in actual architectural knowledge rather than starting from a blank page.
Even at this simple level, the system works surprisingly well and provides meaningful architectural suggestions. More importantly, it gave me a clear understanding of how retrieval shapes the quality of the generated answers and how small changes in scoring logic can change what the LLM produces.
A concrete example in action
To illustrate how this works in practice, imagine a user describing the following situation:
- They are building a SaaS collaboration platform.
- They need to support thousands of concurrent users with real-time features.
- They are deploying on Azure with React and Python.
The system combines all this input: context, problem, and tech stack into a single set of search terms. These terms are then matched against each pattern’s name, keywords, and description fields, with different weights: name matches score highest, followed by keyword matches, then description matches.
For example, if a pattern called “Event-Driven Architecture” includes keywords such as “real-time”, “async”, and “scalability”, it would score well because those terms appear in the user’s input. A pattern with “microservices” in its name would only score if the user explicitly mentioned that term.
This approach is simple but practical; it surfaces patterns that share vocabulary with the user’s requirements and provides the LLM with a relevant starting point for exploring architectural options.
What worked and what did not
Keyword retrieval has real advantages if you want to get a prototype running:
- It is easy to reason about: you can almost “debug” it by looking at which words matched which patterns.
- It is pretty easy to implement and does not require special infrastructure such as vector databases or embedding models.
- It helped me see, concretely, how the retriever influences the generator and why retrieval is as important as prompt quality.
However, during testing, I also saw clear limitations:
- It does not handle synonyms or related concepts (for example, “streaming” and “event‑driven” may never meet).
- It cannot infer needs from numbers or context, such as “ten thousand users” implying horizontal scalability.
- It matches on unimportant words that add noise when users write long descriptions.
- It struggles when users phrase things in unexpected ways, making it harder to systematically evaluate the system and build robust regression tests.
These limitations are not surprising. They are exactly what semantic search and more advanced retrieval techniques are designed to fix. But for my learning journey, keyword retrieval was the right first step to understand the basics.
Conclusion
Building my first RAG retriever using keyword search was an important step in my learning journey. It gave me hands‑on experience with retrieval mechanics, prompt construction, and the interaction between retrieved context and LLM generation, without demanding deep data science skills.
Most importantly, it allowed me to build something concrete without waiting to master every concept. If you are on a similar path, consider doing the same: start simple, build something that works, and let the limitations of that simple system guide your next learning step. Retrieval is a deep topic, and the best way to learn it by iterating, experimenting, and growing with your system.
If you want inspiration, you can start by trying the ADR assistant yourself here: start using the ADR assistant.
In my next blog, I will take the next big step and explore how to move from pure keyword retrieval to semantic and hybrid search. This is where RAG becomes truly powerful, and the architecture decisions become even more interesting.

