optimistic-gold
optimistic-gold2y ago

Language of KB Doc vs Language of questions - Vector mismatch?

Does the language of the KB doc have an impact on the vector formation for that that doc? And therefore cause a potential issue matching with a user question asked in a different language? I am trying to figure out why my bot is not fetching chunks that are clearly available and relevant in the KB.
3 Replies
Tico | Voiceflow Assistant
Yes, the language of the Knowledge Base (KB) document does have an impact on the vector formation for that document, which can affect matching with a user question asked in a different language. This is because the vector formation process involves converting chunks of text from the KB document into numerical representations (vectors) that capture the meaning behind the words. This process, known as embedding, relies on understanding the language in which the text is written to accurately capture its semantic meaning. When a user question is asked in a different language from the KB document, there can be a mismatch in the semantic understanding captured by the vectors, making it challenging for the system to find relevant chunks that match the user's question. This is because the embedding model used to convert text into vectors is language-dependent, and its ability to understand and represent the meaning of text can vary significantly across languages. To address this issue, it's important to ensure that the KB documents and user questions are in the same language, or to use multilingual embedding models that are capable of understanding and representing text across multiple languages. This can help improve the accuracy of matching user questions with relevant chunks from the KB document, thereby enhancing the overall performance of the bot in fetching relevant information. In summary, the language of the KB document plays a crucial role in the vector formation process and can affect the bot's ability to match user questions with relevant chunks from the KB. Ensuring language consistency or using multilingual models can help mitigate potential issues related to language mismatches.
optimistic-gold
optimistic-goldOP2y ago
Ok good to know. I am also noting that my bot is not fetching the right chucks even if they are in the same language as the user questions. Why could that be? How are the chunks formed? Does the document formatting have anything to do with this? Other than the chunk size ofcourse
NiKo | Voiceflow
Try to upload clean docs formatted in markdown for better results. If the right chunks are in the the KB query results (you can use the Preview to check what chunks are returned for a specific question), then you might want to check your prompt and/or test other models to get better results as well.

Did you find this page helpful?