Voiceflow Knowledge base unable to scan URL while other Voice AI competitors can
Voiceflow Knowledgebase is having trouble crawling a URL https://accessmedlab.com while other voice ai companies have no problem crawling the same URL. I am considering switching to a competitor whose voice ai offering is able to scan that url. It appears to be some kind of bug.
25 Replies
Hey there! A few things to double-check when a URL won’t import into the Voiceflow knowledge base:
1. Public access only
Voiceflow’s crawler can fetch pages that are publicly reachable without authentication, captchas or bot-blocking (docs: Importing Data → “You can only import publicly accessible URLs”). If the root of accessmedlab.com responds with a security challenge, redirect loop or 403, the crawl will fail even though the site loads in a browser.
2. Look at the error tooltip
After the import fails you’ll see a red ❗ icon beside the source. Hover over it to see the exact error Voiceflow received (timeout, 403, unsupported content, etc.). That will tell you whether it’s a block on the site or something else (docs: Importing Data → “If something goes wrong when importing your data, hover over the❗ icon to learn why”).
3. Try a different entry point
• Import the sitemap directly: https://accessmedlab.com/sitemap.xml
• Or add one internal page (e.g. https://accessmedlab.com/about) instead of the bare domain. Sometimes the homepage is a redirect that returns an empty body, while inner pages work fine. 4. Site blocks vs. crawler style Competitors that succeed may use a headless-browser crawler; Voiceflow’s fetch is lighter and respects
• Use the Knowledge Base API to push the page HTML yourself if needed. Try the steps above and check the ❗ error message—nine times out of ten it reveals the cause. Once the page is reachable to Voiceflow’s crawler your import should succeed. Docs for reference: https://docs.voiceflow.com/docs/importing-data
• Or add one internal page (e.g. https://accessmedlab.com/about) instead of the bare domain. Sometimes the homepage is a redirect that returns an empty body, while inner pages work fine. 4. Site blocks vs. crawler style Competitors that succeed may use a headless-browser crawler; Voiceflow’s fetch is lighter and respects
robots.txt
. If the domain blocks generic bots, ask the web team to whitelist Voiceflow’s user-agent or temporarily relax the rules.
5. Workarounds while you test
• Upload the same content as a PDF/DOCX or paste as plain text.• Use the Knowledge Base API to push the page HTML yourself if needed. Try the steps above and check the ❗ error message—nine times out of ten it reveals the cause. Once the page is reachable to Voiceflow’s crawler your import should succeed. Docs for reference: https://docs.voiceflow.com/docs/importing-data
hey @surpassai_net , which platform can scrape it? we'll see if there's a permissions thing on our side blocking it
Hi, thanks, Retell AI has no problem scraping the URL (it even retrieved almost 400 pages from the site)
I was also able to download many pages from the same site using the software HTTrack
hm, yeah ChatGPT can't scrape either - how do the chunks look?

is it pulling actual data, or just saying it is?
in VF that's where it shows you the actual chunked data

Retell AI gets the data since the voice agent answers questions correctly based on the URL data
check the KB data though, any data on the internet will be in the training data
so it might sound right but it's not using the KB, it's actually just using a raw LLM
if they are actually scraping and getting the data, then they're doing something different which is interesting. but it'd be good to know if they actually have the data or not
In Retell AI I'm not able to see the data inside their knowledge base
I cant see the chunks in Retell AI

ask it what the test code is for ALPHA-1-ACID GLYCOPROTEIN

this is from this page: https://accessmedlab.com/search-detail/alpha-1-acid-glycoprotein
this is my chatgpt thread on this
Retell AI found the test code

hm, yeah it is correct then
interesting, they must use a different scraping technique we don't use
did you start on VF or retell?
I started with Retell first, but just like a day before signing up with VF. I'm relatively new to AI agents, but learning fast!
Ah, makes sense! Why do both?
We'll look into alternate scraping techniques, i didn't think that was easy to scrape sites like this but if they're doing it we can likely do similar. I sent it to the team that manages our KB
Because I'm using Chat-Dash.com to whitelabel and Chat-Dash only supports Voice with Retell. With VF, Chat-Dash.com supports both chat and voice. I would prefer to just use one (either Retell or VF). It seems Retell has some better voice agent results (with less work than VF) at least in my testing this past weekend
got it, what made it better / easier than VF?
For Retell, I could get reasonable results with the Voice Agent just using the prompt, no nodes.
For VF I have to learn about nodes, which is probably good for more advanced cases if I take the time to learn it well
Why not use just a single prompt in VF?
Both Retell and VF use the same underlying stuff, why not use a single agent in VF?
In my limited testing, I did use the single prompt in VF with the same that I used in Retell. VF did ok, but I think it didn't do as well because of the knowledgebase issue. For VF, I tried downloading as many pages as I could from the accessmedlab.com site and converting the html to txt, but I don't think I was able to get as many of the pages into VF as Retell's crawler
If I could resolve the KB issue and get similiar or better performance than Retell using VF then I have no problem sticking with VF only. I just started working with AI Agents this past week and weekend so I've been learning a lot.
This is from VF, it couldnt find the test code, so the txt files I uploaded to the KB probably didn't have that information. The KB in Retell had about 400 pages from the site, while I was only able to convert about 90 pages from html to txt (by manually downloading and converting using software) to upload to VF's KB

thank you for your help and for looking into this!