graceful-blue
graceful-blue2y ago

Sitemap and 300 URL Restriction

I've been testing Voiceflow and so far it has been very interesting. I tried to add sitemaps to the Knowledge base but I'm running into a problem which is the error 'URLs must be less than 300'. For example one of the sitemaps we have have 10k+ URLs. How do you use the Knowledge base, I'm sure 300 URL is far too little for 90% of use cases, and manually adding 50k URLs in batches of 300 at a time doesn't sound that fun. Is the 300 URL restriction only for lower levels and in Pro version and beyond there will be no limit? Thanks!
7 Replies
NiKo | Voiceflow
Could you share more context? What is your use case and what kind of data do you want to pass to the KB?
graceful-blue
graceful-blueOP2y ago
The client has a webshop with lots of products and categories, so naturally the sitemaps are also huge. I'm trying to create a chatbot that would understand more about their catalogue and give suggestions to customers
NiKo | Voiceflow
Ok, so browsing through numerous web pages doesn't guarantee acquiring data that's clean and ready for database integration. Moreover, you're faced with static data, leading to changes in product availability, quantity, and pricing, alongside additions and deletions of products. Continuously adding pages and updating this information in your database is impractical without resorting to fetching thousands of URLs daily, which is hardly a feasible solution. A more effective approach would be to incorporate documents into your KB that provide context about the types of products featured on the site, including categories, topics, and available filters. This could enhance the generation of optimized search queries for the e-commerce API. Another option could involve using a JSON export of the products, though this still presents the challenge of dealing with outdated information. Given the nature of dynamic data like product lists, a preferable strategy might be to leverage the knowledge base as an initial guide for users, helping them in their search and helping with the generation of a query for the products search API. The API can then return an up to date list of articles, which, with the help of a Large Language Model (LLM), can be used to extract and rank the products most likely to meet the user's needs.
graceful-blue
graceful-blueOP2y ago
Thanks for the answer, so I take it the 300 URL limit is always there and that this kind of need should be made differently? I thought this sitemap could have been a good mvp version of it and then maybe later get specific info via api. I noticed the sitemap feature has a refresh rate setting so with that it should never be completely out of date, but of course more specific needs like stock should be asked straight via api. But for the first version and testing it would have been cool to just be able to dump the data to the knowledge base and test it there how it can retrieve info. And also if I want the bot to be able to give links to the product suggestions I'd imagine some kind of sitemap would be needed. I guess the json format also has some sort of max size that is allowed and if the sitemap format is too big then I imagine then it's the same problem with json?
NiKo | Voiceflow
JSON can be fetched whenever you need fresh data and you can also limit the product list to what fits user request best. No need to download the whole e-commerce products if the user is asking for tech related ones for example. Good thing with JSON is that the product metadata (link, price, description,...) will be available in the API response.
graceful-blue
graceful-blueOP2y ago
So are you saying that I should do it something like this: - First ask the customer what he's interested in and capture that - Search with that from webshop via api and return that product list - Suggest x products to the customer from that list and get links and other info I want to tell to the customer from the api response.
NiKo | Voiceflow
Yes, use the KB to find relevant topics/categories based on user's request. Query the shop API and use the response as context. Use LLM to select best products from API response based on the existing context and generate an array with x products (and the matadata).

Did you find this page helpful?