NEED HELP WITH PROJECT FOR CITY HALL IN CZECH REPUBLIC
Hello guys I am building a chatbot for our local region - that should be fed with all information ideally from their website: https://www.kr-ustecky.cz/ ...but the website is so large, I am at my wits end, how to make sure to scrape all the content into good format and which tools to implement to make the chatbot as accurate as possible - there is about 45k urls on the website :/ ...is there anyone who has already done similiar projects in the past and would be willing to participate on this project ?
From my experience, I havent found tools / KB that would be able to hold so many files and would accuratelly provide relevant data ...I dont want to give up this project, as we got already paid, but we are not able to deliver this ourselves...
3 Replies
To implement a Knowledge Base using Voiceflow, you can follow these steps:
1. Creating the Knowledge Base
In Voiceflow, you can create a Knowledge Base (KB) to store and manage the content:
- Navigate to the Knowledge Base tab within the Content section of your Voiceflow dashboard.
- Add Data Sources: Click the 'Add Data Source' button and select the type of data source that aligns with your requirements. Options include URL(s), Plain text, Sitemap, PDF, and Docx, catering to various data formats and sources.
- Set Refresh Rates: Ensure the data remains up-to-date by setting refresh rates (Never, Daily, Weekly, Monthly).
2. Optimizing the Knowledge Base
To handle a large volume of data and ensure accurate responses:
- Chunking: The parser service in Voiceflow will chunk the content into manageable pieces.
- Embedding: Each chunk is converted into a vector representation for efficient retrieval.
- Customization: Use the KB Settings modal to adjust parameters like Model, Temperature, Max Tokens, etc., to optimize the AI responses.
3. Deploying the Chatbot
- Create an Agent: In your Voiceflow workspace, create a new assistant and upload the data to the Knowledge Base.
- Testing: Thoroughly test the chatbot to ensure it provides accurate and relevant responses.
Tools and Resources
- Voiceflow Knowledge Base Documentation: Voiceflow Knowledge Base
By following these steps, you can effectively build and deploy a chatbot that accurately utilizes the extensive content from your data sources.
correct-apricot•15mo ago
scraping everything might not be a good idea, since most links will be redundant and repetitive. what is the purpose of the website
also i suggest not doing 43k url kb as it will use a lot of tokens and provide irrelevant answers mostly because of vector searches are based on words
one suggestion is that you can try and feed the chatbot the data form the current page the user is in as the context for the LLM using some JavaScript.
another idea would be to do a search of the site using a search api and then use those links as the possible links.