surpassai_net•5mo ago

Voiceflow Knowledge base unable to scan URL while other Voice AI competitors can

Voiceflow Knowledgebase is having trouble crawling a URL https://accessmedlab.com while other voice ai companies have no problem crawling the same URL. I am considering switching to a competitor whose voice ai offering is able to scan that url. It appears to be some kind of bug.

25 Replies

Tico | Voiceflow Assistant•5mo ago

Hey there! A few things to double-check when a URL won’t import into the Voiceflow knowledge base: 1. Public access only Voiceflow’s crawler can fetch pages that are publicly reachable without authentication, captchas or bot-blocking (docs: Importing Data → “You can only import publicly accessible URLs”). If the root of accessmedlab.com responds with a security challenge, redirect loop or 403, the crawl will fail even though the site loads in a browser. 2. Look at the error tooltip After the import fails you’ll see a red ❗ icon beside the source. Hover over it to see the exact error Voiceflow received (timeout, 403, unsupported content, etc.). That will tell you whether it’s a block on the site or something else (docs: Importing Data → “If something goes wrong when importing your data, hover over the❗ icon to learn why”). 3. Try a different entry point • Import the sitemap directly: https://accessmedlab.com/sitemap.xml
• Or add one internal page (e.g. https://accessmedlab.com/about) instead of the bare domain. Sometimes the homepage is a redirect that returns an empty body, while inner pages work fine. 4. Site blocks vs. crawler style Competitors that succeed may use a headless-browser crawler; Voiceflow’s fetch is lighter and respects robots.txt. If the domain blocks generic bots, ask the web team to whitelist Voiceflow’s user-agent or temporarily relax the rules. 5. Workarounds while you test • Upload the same content as a PDF/DOCX or paste as plain text.
• Use the Knowledge Base API to push the page HTML yourself if needed. Try the steps above and check the ❗ error message—nine times out of ten it reveals the cause. Once the page is reachable to Voiceflow’s crawler your import should succeed. Docs for reference: https://docs.voiceflow.com/docs/importing-data

Braden•5mo ago

hey @surpassai_net , which platform can scrape it? we'll see if there's a permissions thing on our side blocking it

surpassai_netOP•5mo ago

Hi, thanks, Retell AI has no problem scraping the URL (it even retrieved almost 400 pages from the site) I was also able to download many pages from the same site using the software HTTrack

Braden•5mo ago

hm, yeah ChatGPT can't scrape either - how do the chunks look?

Braden•5mo ago

is it pulling actual data, or just saying it is?

Braden•5mo ago

in VF that's where it shows you the actual chunked data

surpassai_netOP•5mo ago

Retell AI gets the data since the voice agent answers questions correctly based on the URL data

Braden•5mo ago

check the KB data though, any data on the internet will be in the training data so it might sound right but it's not using the KB, it's actually just using a raw LLM if they are actually scraping and getting the data, then they're doing something different which is interesting. but it'd be good to know if they actually have the data or not

surpassai_netOP•5mo ago

In Retell AI I'm not able to see the data inside their knowledge base I cant see the chunks in Retell AI

surpassai_netOP•5mo ago

Braden•5mo ago

ask it what the test code is for ALPHA-1-ACID GLYCOPROTEIN

Braden•5mo ago

this is from this page: https://accessmedlab.com/search-detail/alpha-1-acid-glycoprotein

Braden•5mo ago

https://chatgpt.com/share/688800f0-abac-8010-bc84-efb0f880b57b

ChatGPT

ChatGPT - Scrape website content

Shared via ChatGPT

Braden•5mo ago

this is my chatgpt thread on this

surpassai_netOP•5mo ago

Retell AI found the test code

Braden•5mo ago

hm, yeah it is correct then interesting, they must use a different scraping technique we don't use did you start on VF or retell?

surpassai_netOP•5mo ago

I started with Retell first, but just like a day before signing up with VF. I'm relatively new to AI agents, but learning fast!

Braden•5mo ago

Ah, makes sense! Why do both? We'll look into alternate scraping techniques, i didn't think that was easy to scrape sites like this but if they're doing it we can likely do similar. I sent it to the team that manages our KB

surpassai_netOP•5mo ago

Because I'm using Chat-Dash.com to whitelabel and Chat-Dash only supports Voice with Retell. With VF, Chat-Dash.com supports both chat and voice. I would prefer to just use one (either Retell or VF). It seems Retell has some better voice agent results (with less work than VF) at least in my testing this past weekend

Braden•5mo ago

got it, what made it better / easier than VF?

surpassai_netOP•5mo ago

For Retell, I could get reasonable results with the Voice Agent just using the prompt, no nodes. For VF I have to learn about nodes, which is probably good for more advanced cases if I take the time to learn it well

Braden•5mo ago

Why not use just a single prompt in VF? Both Retell and VF use the same underlying stuff, why not use a single agent in VF?

surpassai_netOP•5mo ago

In my limited testing, I did use the single prompt in VF with the same that I used in Retell. VF did ok, but I think it didn't do as well because of the knowledgebase issue. For VF, I tried downloading as many pages as I could from the accessmedlab.com site and converting the html to txt, but I don't think I was able to get as many of the pages into VF as Retell's crawler If I could resolve the KB issue and get similiar or better performance than Retell using VF then I have no problem sticking with VF only. I just started working with AI Agents this past week and weekend so I've been learning a lot.

surpassai_netOP•5mo ago

This is from VF, it couldnt find the test code, so the txt files I uploaded to the KB probably didn't have that information. The KB in Retell had about 400 pages from the site, while I was only able to convert about 90 pages from html to txt (by manually downloading and converting using software) to upload to VF's KB

surpassai_netOP•5mo ago

thank you for your help and for looking into this!

Voiceflow Knowledge base unable to scan URL while other Voice AI competitors can

Did you find this page helpful?