RAG Applications: Query Rewriting

One of the most powerful advancements of GenAI applications is to enable users to seek information in their own language in free-form text. This information access, where it is needed most, is ushering in a new revolution of knowledge enablement for internal employees and customers alike. GenAI applications such as Question-and-Answering systems or Semantic Summarization based on RAG pipelines enable users to ask questions of private, internal, confidential enterprise data. 

Challenges in user Queries

This unstructured, free-form text as input is not without its complexities.. When users can ask questions in their natural language, the nature of questions can introduce certain challenges to a RAG pipeline that a RAG designer must overcome. Some of the common challenges we see with users’ questions are – 

  • Short or too broad – Questions can be way too short carrying very minimal semantic meaning and are therefore usually too broad in context. E.g. “mutual funds”  
  • Vague Intent – The user’s intent is not obvious from the question. When a user asks a question such as “Investing in Acme Mutual Funds” – are they looking for a description of the item, or past performance characteristics, risks, controversies, recent news or something else altogether.
  • Ambiguous Target Context – Question is ambiguous about its context – this can happen because users do not realize this or do not recognize that the underlying data corpus might have multiple unrelated matches to the query. E.g. If the input text is “What investment risks should one be aware of?” – this could refer to investment risks in mutual funds, equity, options, commodities or any number of financial instruments.  
  • Misspelled questions or incorrect grammar – a common issue is users asking questions with misspelled terms or incorrect grammar. We would like a RAG pipeline to be robust to such inputs and still provide reasonable answers for the user.

 

Query Rewriting as the solution

Identifying user intent is a problem that Information Retrieval Systems have to solve in order to improve the quality of the response. In a RAG pipeline, we employ a technique known as Query Rewriting which tries to address the above issues with users’ questions. Dataworkz RAG Builder enables RAG designers with in-built Query Rewriting capabilities that can be employed and controlled via simple configuration.

 

Query Rewriting might also provide inputs for the RAG pipeline to determine query strategies, keyword extraction, and other information about the query. For instance, a query could be identified as consisting of multiple sub-questions. The query execution strategy could then be executing individual sub-questions through the pipeline and then merging them together. Similarly, keyword extraction can also help narrow down the scope of the search helping create higher quality context for the LLM to respond from. 

Query Rewriting in Dataworkz RAG Builder

The Dataworkz RAG Builder Query Rewriting component leverages the power of LLM and Prompt Engineering to have the LLM – 

  • Expand very short queries
    • E.g. if the user’s query is “capex funds” an LLM’s prompt might encourage the LLM to rewrite the query as “What are the top mutual funds by performance in the capex category?”
  • Evaluate (from the tone, content and context of the query) and express the question emphasizing the user intent
    • E.g. If the user’s query is “investing in mutual funds”, an LLM might rewrite the query as “What are the pros and cons to keep in mind when investing in mutual funds”
  • Convert the question to terms from the domain jargon
    • E.g. if the user’s query is “investing in large funds” the query could be written to “Back-dated investment performance  of blue-chip mutual funds” which may provide much more meaningful responses.

 

Rewriting the query helps improve semantic search (for Search based/Summarization applications) because the question gets more specific and has more “meaning” in it. Similarly, it also allows us to provide the eventual LLM (in the case of Question-and-Answer, Chatbot kinds of applications) with a much better prompt, allowing it to respond meaningfully.

Dataworkz RAG Builder makes incorporating powerful capabilities like Query Rewriting into your RAG system as easy as dragging and dropping a component into your RAG pipeline. Dataworkz is constantly looking at new techniques and investing in research in this area to help improve capabilities, provide configuration to control the experience and help improve the pipeline response quality. 

 

Additional Approaches

While Query Rewriting can go a long way towards better response quality, it might not be able to nail the original intent every single time. There are approaches to solve this problem that can be treated as alternatives or complimentary to Query Rewriting. These include –

  • Asking the user for clarification of intent
  • Suggesting a list of alternate questions that can provide better answers
  • Making the user aware of similar topics that might exist in the data corpus so the user can identify what topic is of their interest

 

Conclusion

Dataworkz Query Rewriting can bring sophisticated capabilities to your RAG pipeline with extraordinary ease. It will allow the RAG pipelines you build in Dataworkz to help your users get meaningful responses, help them refine their input and effectively deliver on the promise of better information access.

Scroll to Top