adverse-sapphire•2y ago
issue with KB not finding the info in the KB
I have noticed that the KB quite often doesnt find the answer to a question in the KB or doesnt retrieve the right chunk. So i have set up a test bot with only 1 text document. This document is a small list of businesses in a town, with a description and contact details for each. It has been structured with chatgpt in order to be presented in the most optimal way for embeddings. So u can have access to this simple text document in my project directly.
The project ID is 659eb64b59d399715077ad0c
lets concentrate on 2 business in this list with bad results (but ther could be plenty others potentially) : AFR performance offers sport coaching service, and "atelier Desbel" renovates armchairs. I am asking both "do u know AFR performance" , and "who can i contact if i need sport coaching", same for the other.
Result : the KB cant find any relevant info on both questions about AFR performance, even if i have added a second mention at the end of the document saying that "afr performance is providing sport coaching". About atelier desbel, if we ask who renovate armchair it will name another business which doesnt mention armchair renovation.
This is only a small example that should enable us to test and check where the problem should come from. But this issue happens very regularly, the KB retrieve a chunk which is not speaking about the question, when the appropriate one is somewhere else.
Please contact me for more test to solve this problem
10 Replies
To help you debugging this, we will need some info:
- Can you share the text you've uploaded to your KB
- Your .vf file
- Some steps to repro
If you can't share the .vf file:
- The system prompt, instructions and settings you're using for KB retrieval
- Some utterances to test
adverse-sapphireOP•2y ago

adverse-sapphireOP•2y ago
adverse-sapphireOP•2y ago
here are the vf file and the doc uploaded. I have explained the different questions that u can reproduce above with the results that i got
its only one example so that we can test on something concrete. i have other bots with only plain text containing 1 question and 1 answer ,for example 1 question about parking in the own, another question in a separate doc about restaurant in the town etc...and when i ask a question about restaurant, the chunk system retrieve the parking chunk with the higher score.
so its not only a question of debugging this file i sent you, its a global issue in the chunk retrieval system that i notice , and i am ready to investigate this with you if u want
Thanks for sharing.
IMO you should optimize your document to help LLM to give better results.
Here is an example using your doc after a quick LLM pass for optimization:



adverse-sapphireOP•2y ago
Ok Niko thanks for sharing this, i will use it for my other list to make sure docs are optimized in thjs way.. But It solves half of tbe problem.. The other half is the fact that the chunk system retrieves wrong chunks with the higher score in many many cases on all my projects. In this case u used 2 chunks on a very small doc anf concentrate on the output so u cant notice this.
What u can try to do is asking a series of questions about info in the 2cd chunk (my original doc was 2 chunk long but the one u sent me is shorter so might be only 1chunk long) and asking for only 1 chunk as a source. Then u have a chance to see what i mean.
Strange, I can't repro on my end.
The cleaned version has also 2 chunks, and asking for an info only available in the second chunk while limiting usage to 1 chunk give me a correct answer.


adverse-sapphireOP•2y ago
Can u check the score of the 2cd chunk if u set the limit to 2 chunks.. U should have one high score around 80% in the right chunk, and a low score in tje other chunk that has nothing to do with the question..
If both scores are very similar it will not be normal
optimistic-gold•2y ago
Hi @NiKo | Voiceflow could you please let me know how you got an optimized document ? What was the procedure or LLM you used to get that output