Ok, so browsing through numerous web pages doesn't guarantee acquiring data that's clean and ready for database integration. Moreover, you're faced with static data, leading to changes in product availability, quantity, and pricing, alongside additions and deletions of products. Continuously adding pages and updating this information in your database is impractical without resorting to fetching thousands of URLs daily, which is hardly a feasible solution.
A more effective approach would be to incorporate documents into your KB that provide context about the types of products featured on the site, including categories, topics, and available filters. This could enhance the generation of optimized search queries for the e-commerce API.
Another option could involve using a JSON export of the products, though this still presents the challenge of dealing with outdated information.
Given the nature of dynamic data like product lists, a preferable strategy might be to leverage the knowledge base as an initial guide for users, helping them in their search and helping with the generation of a query for the products search API.
The API can then return an up to date list of articles, which, with the help of a Large Language Model (LLM), can be used to extract and rank the products most likely to meet the user's needs.