exotic-emerald•17mo ago
Inaccurate answers received from KB even after adding proper documents
Can someone tell me why some of the links are highlighted are why some are not? The reason for asking is this is because I feel voice flow isn't able to access all my files from the knowledge base. Hence I'm not getting an accurate answer.

11 Replies
@NiKo | Voiceflow I have no clue
Are all those files PDF upload ?
Also, looks like some of those filenames are pretty long, you might want to name them with something like:
episode-xx-2024-04-15.pdf
If some files aren't uploaded correctly you should have a fail status and no content/chunks. Is that the case on your end?
This is the case I sent you earlier @NiKo | Voiceflow
exotic-emeraldOP•17mo ago
Thank you for your response. I confirm that all uploaded files are PDFs. I will attempt renaming each file for clarity. These PDFs represent segments of one podcast, which I divided into separate documents to enhance the accuracy of the responses. I believe all files have been uploaded successfully and there are no issues with their status.
This is what I get using GPT-4-Turbo using the PDF you've shared with @W. Williams (SFT)

Base on your use case, I will go for a custom parser that will handle the source (PDF here) and use an LLM pass (or code) to generate a json with the full convo split by speakers.
Do you have access to the KB Table (JSON) upload BETA?
exotic-emeraldOP•17mo ago
Thanks for your responce! Sorry didn't understand the custom parser bit. No, Niko. I dont think i have access to the KB Table (JSON) upload BETA.
So the content you're passing to the KB plays a large role on the quality of the responses you will got from the LLM. If you upload a PDF and look at the chunks, chances are that you will have those conversations/transcripts split in multiple parts. This is because LLM prefer formatted data (markdown, JSON, XML...).
This is why we provide KB APIs, to allow you to upload content to your KB using custom integrations.
For your use case, I assume that you generate transcripts from previous podcasts, save them as pdf and upload them to the KB.
You might want to automate that part by handling the parsing/formatting/uploading part on your end.
So instead of having raw docs with:
person1: convo
person2: convo
person1: convo
You will have something more detailed like:
[
{
title: xxx,
date: xxx,
topics: xxx,xyz,
transcripts: [
{
speaker: Niko,
text: convo,
time: xxx
},
{
speaker: Leo,
text: convo,
time: xxx
},
{
speaker: Niko,
text: convo,
time: xxx
}
]
}
]
exotic-emeraldOP•17mo ago
Got it, Thank you so much @NiKo | Voiceflow for putting the time and effort to explain this whole process out to me
metropolitan-bronze•17mo ago
Hi Niko, I feel this is very important for everyone building chatbots with custom knowledge. Is there a tutorial on how to do this?
You mean this knowledge API right? https://developer.voiceflow.com/reference/post_v3alpha-knowledge-base-docs-upload
It is only possible to upload txt, docx and pdf, not json, xml or markdown. If I follow your solution would I just use txt here and then input a structured document using xml, markdown or json?
Thanks for helping!
The table (JSON) upload is in BETA preview for now but should be available soon.
But yes, you can upload a .txt file or parse a text with JSON, markdown or XML content. It's just that this will not be as optimized as the KB table import.