exotic-emerald
exotic-emerald17mo ago

Inaccurate answers received from KB even after adding proper documents

Can someone tell me why some of the links are highlighted are why some are not? The reason for asking is this is because I feel voice flow isn't able to access all my files from the knowledge base. Hence I'm not getting an accurate answer.
No description
11 Replies
W. Williams (SFT)
@NiKo | Voiceflow I have no clue
NiKo | Voiceflow
NiKo | Voiceflow17mo ago
Are all those files PDF upload ? Also, looks like some of those filenames are pretty long, you might want to name them with something like: episode-xx-2024-04-15.pdf If some files aren't uploaded correctly you should have a fail status and no content/chunks. Is that the case on your end?
W. Williams (SFT)
This is the case I sent you earlier @NiKo | Voiceflow
exotic-emerald
exotic-emeraldOP17mo ago
Thank you for your response. I confirm that all uploaded files are PDFs. I will attempt renaming each file for clarity. These PDFs represent segments of one podcast, which I divided into separate documents to enhance the accuracy of the responses. I believe all files have been uploaded successfully and there are no issues with their status.
NiKo | Voiceflow
NiKo | Voiceflow17mo ago
This is what I get using GPT-4-Turbo using the PDF you've shared with @W. Williams (SFT)
No description
NiKo | Voiceflow
NiKo | Voiceflow17mo ago
Base on your use case, I will go for a custom parser that will handle the source (PDF here) and use an LLM pass (or code) to generate a json with the full convo split by speakers. Do you have access to the KB Table (JSON) upload BETA?
exotic-emerald
exotic-emeraldOP17mo ago
Thanks for your responce! Sorry didn't understand the custom parser bit. No, Niko. I dont think i have access to the KB Table (JSON) upload BETA.
NiKo | Voiceflow
NiKo | Voiceflow17mo ago
So the content you're passing to the KB plays a large role on the quality of the responses you will got from the LLM. If you upload a PDF and look at the chunks, chances are that you will have those conversations/transcripts split in multiple parts. This is because LLM prefer formatted data (markdown, JSON, XML...). This is why we provide KB APIs, to allow you to upload content to your KB using custom integrations. For your use case, I assume that you generate transcripts from previous podcasts, save them as pdf and upload them to the KB. You might want to automate that part by handling the parsing/formatting/uploading part on your end. So instead of having raw docs with: person1: convo person2: convo person1: convo You will have something more detailed like: [ { title: xxx, date: xxx, topics: xxx,xyz, transcripts: [ { speaker: Niko, text: convo, time: xxx }, { speaker: Leo, text: convo, time: xxx }, { speaker: Niko, text: convo, time: xxx } ] } ]
exotic-emerald
exotic-emeraldOP17mo ago
Got it, Thank you so much @NiKo | Voiceflow for putting the time and effort to explain this whole process out to me
metropolitan-bronze
metropolitan-bronze17mo ago
Hi Niko, I feel this is very important for everyone building chatbots with custom knowledge. Is there a tutorial on how to do this? You mean this knowledge API right? https://developer.voiceflow.com/reference/post_v3alpha-knowledge-base-docs-upload It is only possible to upload txt, docx and pdf, not json, xml or markdown. If I follow your solution would I just use txt here and then input a structured document using xml, markdown or json? Thanks for helping!
NiKo | Voiceflow
NiKo | Voiceflow17mo ago
The table (JSON) upload is in BETA preview for now but should be available soon. But yes, you can upload a .txt file or parse a text with JSON, markdown or XML content. It's just that this will not be as optimized as the KB table import.

Did you find this page helpful?