Step by Step: Building a RAG Chatbot with Minor Hallucinations
In the rapidly evolving landscape of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a groundbreaking technique that enhances...
Whether you are just starting your observability journey or already are an expert, our courses will help advance your knowledge and practical skills.
Expert insight, best practices and information on everything related to Observability issues, trends and solutions.
Explore our guides on a broad range of observability related topics.
In this article, I want to share a method to improve your LLM’s reliability, making LLM apps produce consistent results for particular inputs, by creating something I call “Islands of Confidence”.
An island of confidence is basically a set of inputs where we choose NOT to run an LLM. Instead, we run normal deterministic code.
This approach can be particularly effective when paired with Open Source Large Language Models, as it allows for greater flexibility and customization in handling specific inputs.
We’ll start with a very simple example and build it from there, step by step.
Let’s say we have a customer support chatbot, where users frequently ask: “How do I create a new account?”.
Since this question is so frequent, there’s no reason to run the LLM. Instead, we can simply add an ‘if’ statement before the model that checks if the user input is equal to the question above. If it is – we can return a cached, verified answer.
But this example is useless because another user might ask the same question a little bit differently. Let’s fix that.
What if the user asks the same question but a little bit differently: “yo how to register for new account?”.
In this case, we still want to detect it and run the same logic. Fortunately, there’s a simple solution: we can fine-tune a small NLP binary model to detect paraphrases of our question.
One method is to use sentence transformers and a model such as paraphrase-mpnet-base-v2. From what I’ve seen, you only need around 10-20 examples for good results. Check out the SetFit library by Hugging Face.
Now, our island isn’t just a single string—it’s any paraphrase of the question “How do I create a new account?”.
But we ignored one important detail: our chatbot is probably a RAG, and to answer the question, it usually needs to retrieve context from the knowledge base.
By creating an island that simply returns a string, we basically ignore the retrieval part. This creates a problem: what if a new version of the web app is deployed and the process of creating a new account changes?
Even though the KB would probably be updated, our cached answer is now deprecated.
To solve this, instead of just returning a string, we can check if the context that was originally used to generate the answer is still relevant. If not, we can invalidate the island.
Talk-to-your-data use cases are really useful, this is how they work:
Unfortunately, a hallucination here can lead to incorrect SQL statements, which could lead to completely incorrect data. The user might not know how to interpret it correctly.
Fortunately, our technique just works out of the box here! The island of confidence can return the verified SQL query, as long as the database schema doesn’t change, very similarly to the way we handled RAGs.
In a future post, I’ll discuss how islands of confidence can work with more complex variations of a question (not just paraphrases), as well as tools. Let me know what you think!
Alon is the Chief Technology Officer and Co-Founder of Coralogix. Since building his first neuroevolution-based Super Mario bot in 2012 (which barely scratched the first level—too many 'hallucinations'...), he’s been fascinated by AI agents.
In the rapidly evolving landscape of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a groundbreaking technique that enhances...
In May 2023, Samsung employees unintentionally disclosed confidential source code by inputting it into ChatGPT, resulting in a company-wide ban...
As organizations rush to implement Retrieval-Augmented Generation (RAG) systems, many struggle at the production stage, their prototypes breaking under real-world...