What is the best way to summarize a document in the KB with RAG?
I have a word document in the KB with 12 chunks of 1000 tokens. I assigned a specific tag to that document.
The document contains the transcript of a discussion between 2 persons.
I use the query API to target that document when the user request a summary.
My reasoning is to send as many chunks of the document to the LLM (GPT 3.5 turbo) to get a complete summary.
Therefore I set the "chunkLimit" parameter (from the query API) to 10. In the query API response, I see that 10 chunks are selected with their corresponding similarity score but the output from the query API is "null".
The same is true is I set the "chunkLimit" parameter to 5.
If I I set the "chunkLimit" parameter to 2 or 3 or 4. It works but the summary is not complete. It does not include important parts of the document.
In the KB settings, the Chunk limit setting is set to 10. The max token setting is set to 1000 tokens.
Am I doing something wrong?
Is there another way to get a complete summary of a document?
Thanks for your help
14 Replies
You can use the document api to just grab the whole doc
fair-rose•17mo ago
@Julien funny I should see this today as I’m having similar issues with the KB. I have 6 docs loaded into my KB, and it retrieves data from them perfectly fine up when my chunk limit is set to up to 6 chunks, then when I set the chunk limit to 7 or above it falls over and starts giving me a no results in the preview (in the KB section) or null return when Testing in the design canvas. Probably something I’m not understanding. I’ve tried it with and without tags using the Tags API aswell.
What LLM / model are you using?
What LLM / model are you using?
I am using GPT 3.5 turbo
So it is not a context issue. Have you tried it in the KB preview?
You mean to feed the whole doc to the LLM? Which document API would that be? I checked but am unsure.
that will revrieve all the chunks
for a doc
Ok and is there a way to feed all the chunks to a LLM to request a summary?
Yes I tried. I kept only that document in the KB and asked for a summary. It works only if the chunk limit settings is less than 5. Otherwise it says "not found"
Let me know if you find anything to solve this. I will do the same 🙂
fair-rose•17mo ago
@W. Williams (SFT) thanks, the doc chunk retrieval looks handy, I didn’t see that one, but would that not be more of a thing to use if you were targeting just a single document (ID)? My use case is a set of 6 employment contracts. I ask it something simple like, list the job title for each employee in each contract we have in the KB. So I would want it to retrieve chunks from all 6 contracts rather than just the one, hence I don’t know if the retrieve chunk api would be best solution here would it? The issue I have is it completes that task fine as long as my chunk limit setting in the KB isn’t set over 6 chunks. For example, it will bring back all 6 employee names and titles from all 6 different docs (they are uploaded as docx files). I get the same issue in both the KB preview and in the design canvas flow Test. I’ll do some more testing tonight and see if it’s a consistent issue across different data sets. It’s very frustrating though as it works in one instance and then falls over in another. Be great to get an understanding of how it’s treating things. Good shout about the model though. I was using GPT3.5 and Claude 1.0, so I will try with something with a bit more capability as well. Thanks for the input.
I just tried chunk limit=10 with ChatGPT 4 and it worked! But extremely expensive in tokens. Will try other models.
fair-rose•17mo ago
@Julien Yes, solved for me too doing exactly the same. Quite interesting that as my queries were quite simple. All a learning curve isn’t it!
sensitive-blue•17mo ago
You can try using Haiku or Sonnet they're lengths are longer. Not sure about why gpt 3.5 isn't working we should have a backend retry with the longer context model if its too long
Did a bit of testing and counted the tokens. Def an issue with the context length. Looks to bomb on anything over 4-5K context or larger.
This was with GPT-3.5-turbo. Haiku works fine.