RAG Applications: Query Retriever

The core part of a RAG pipeline is the retrieval of information from its data corpus. This is where we try to match the user intent in the query to the semantic meaning in the data. The goal is to extract enough related and relevant information so that the downstream LLM can provide a meaningful response to the user’s query. 

In an Enterprise RAG system though, the RAG designer has some other considerations to take into account apart from just the user intent –

  • Enterprise data needs to be served up only to those that have access to that data
  • A GenAI Application might want to expose certain versions and variants or certain facets or sub-systems of the data to their users based on various factors such as their location, usage pattern (what product are they asking a question in), entitlements (what products and services have they paid for) or profile (such as interests)

 

The core of the Query Retrieval process is the semantic search from the embeddings stored and indexed in a Vector DB. However, there are several other supporting components in order to ensure that it meets its goals. Query Retrieval can be seen as a set of multiple steps. 

  • Retrieval
  • Merging
  • Re-ranking

There is a lot of on-going research in this area and it is constantly evolving as we learn new techniques and capabilities. Dataworkz RAG Builder provides a very powerful Query Retriever that caters to these requirements enabling the RAG Designer to build sophisticated pipelines.

Retrieval

The primary retrieval in a RAG pipeline is Semantic Search – the retrieval of data chunks from the Vector DB. However, there is a lot more to this besides just the semantic search.

Semantic Search

Enterprise data that is to be served up in an GenAI application is split into chunks taking into account its structure and semantic content as well as physical limitations such as its size and stored in a queryable database called a Vector Database. (See: Embeddings and Encoder and Chunking for more on this). When a user asks a question it is converted or encoded into a vector embedding. This question embedding is compared for similarity with the chunks stored in the Vector DB. In an ideal world, these chunks contain the answer to the user’s query and the downstream LLM can generate a perfectly good answer out of it. This process is called Semantic Search.

 

However, Enterprise RAG designers have to contend with various complexities in retrieval. In the subsequent sections, we cover various complexities and solutions that are part of the Retrieval process and how Dataworkz RAG Builder helps you configure and take advantage of its powerful Retrieval engine.

Lexical Search

One of the problems include the possibility of several chunks in the data corpus having similar semantic content or differing from each other only slightly. For instance a question on “scheduling an event in an Outlook Calendar” can result in chunks that talk about “scheduling events in iCalendar” or “scheduling events in Google Calendar” or many similar variations of the concept. We use Lexical Search (i.e. Full Text Search) to use keywords in the question to narrow down the universe of search for Semantic Search. Reducing the scope to chunks that match “Outlook Calendar” will make it much easier for Semantic Search to provide more relevant chunks.

 

Dataworkz RAG Builder allows you to specify the critical keywords in your data corpus in its configuration. It can also discover most frequent keywords used in the data corpus. Lexical Search is optional but when used, it can be employed in 2 ways – 

Pre-filtering with Lexical Search Results

As in the example above, the search universe for Semantic Search is limited by the results of the Lexical Search allowing it to provide better results for the required keywords. This works well for large data corpus when users are likely to ask questions with specific keywords that match well with the data corpus. For instance systems with users in legal or financial services where they are likely to be familiar with technical terms that will narrow down the search will benefit with this approach. This approach is weak when users are likely to ask very general questions and the question might not contain specific keywords. In this scenario, it can reduce the effectiveness of semantic search.

 

Hybrid Search

In this mechanism, we take lexical search results and semantic search results separately and score chunks based on how well they scored in each search using a mechanism called Reciprocal Rank Fusion (RRF). This has the advantage of leveraging lexical search to boost the scores of semantic search results without limiting semantic search. This might be a good fit In smaller data sets and for users of systems that tend to ask questions that are less technical in nature. 

 

How do you choose?

Dataworkz RAG Builder provides several tools that together can help an Enterprise RAG designer make appropriate choices. 

  • Use the Insights capabilities to see what kind of inputs users to the system are providing. Understanding the nature of queries and the expectation of the users will go a long way in making the right design decisions. 
  • Dataworkz provides the ability to Probe your RAG pipeline to see how various stages are working. Peek into the results provided by Semantic Search, Pre-filtering or Hybrid Retrieval mechanisms to decide what provides the best results for you.

Dataworkz is also constantly evaluating and researching the best approaches for various situations. Best practices evolve from more experience from different customer perspectives and experiences. Dataworkz publishes these best practices and learnings that benefit RAG designers. Dataworkz also has on-going research to look at improving on the Retrieval capabilities to encode these best practices and automatically pick the right strategy based on various factors.

 

Other Enterprise Requirements

We touched upon other enterprise requirements that an Enterprise RAG designer will have to cater to – 

Data Access

RAG Designers need their Enterprise GenAI application to be configured such that it provides output to users taking into account their individual level of access to data. For instance, when a customer asks a question, they should not be returned answers about another customer’s account even if it “seems to answer” their question. 

 

Dataworkz RAG Builder allows data to be configured with appropriate roles and data access rules and will enforce these rules during the Query Retrieval process. The RAG Builder can easily add and change data access rules as the business needs evolve.

 

Filtering

There are various scenarios where a RAG Designer might want GenAI applications to limit the scope of the data to a specific set of facets. 

  • For example – limit the data to specific departments – Engineering + Support. 
  • Or in a document search system, filter the data to specific products and versions that the user has purchased or is entitled to

Metadata can be attached to documents that can then be used in various use-cases – such as in faceted search or to limit the search via an API parameter. The Retrieval engine will apply these metadata filtering rules to pick only the relevant data set.

 

Merging and Re-ranking

Query Retrieval results in a list of chunks of data that was previously encoded into the Vector DB and now matches the user’s input semantically. It is often the case that a particular topic is spread over various parts of a document or a data set and covers different aspects of that topic in these various locations. Research into RAG has shown that diversifying the context can help improve the outcome of the pipeline. Dataworkz RAG Builder incorporates techniques to help RAG Designers achieve this diversification.

Merging

It is often the case that several neighboring chunks have similarity scores that indicate that they have semantic similarity to the input. Since we limit our context for generation to the top chunks, we could end up picking similar chunks from the same section and ignore other chunks from other locations. If we could diversify the context, the LLM might provide richer responses. One way of doing this is to merge neighboring chunks with similar similarity scores so that they form one larger chunk.

Maximal Marginal Relevance

Instead of blindly picking the top scored chunks, it would be beneficial to include a wider context of more diversified content that will provide the LLM with an opportunity to provide better answers. Dataworkz RAG Builder provides the ability to turn on Maximal Marginal Relevance to re-rank the results of the Query Retrieval. It maximizes divergence of context by looking at the similarity of chunks in the Query Retrieval results and tries to pick chunks with scores that are high but diverge in content from each other. 

 

Conclusion

Query Retrieval is the core of a RAG pipeline. Having the right set of configuration and control over it is essential to cater to the needs of users and to meet their expectations. Enterprise deployments bring with them additional requirements that should be easy to incorporate. Dataworkz RAG Builder makes it easy for a RAG Designer to focus on providing the best outcome for their GenAI Applications by making it possible to incorporate sophisticated techniques by just dragging and dropping and configuring them. The additional tooling provided by Dataworkz RAG Builder provides insights into the questions and the quality of answers and allows for fine-tuning the RAG pipeline to suit their users’ needs.

Scroll to Top