fascinating-indigo
fascinating-indigo17mo ago

Permission for Voiceflow crawler/bot?

When I try to use a URL as a source, I get the error message the site needs access. Our website requires explicit permissions to allow crawlers. What should I use as the entity of the Voiceflow crawler/bot? Thanks!
4 Replies
Tico | Voiceflow Assistant
To allow the Voiceflow crawler/bot to access your website, you'll need to modify your website's robots.txt file. This file is used to instruct web robots (like Voiceflow's crawler) about which areas of the website should not be processed or scanned. Here's a basic example of what you can add to your robots.txt file to allow Voiceflow's crawler:
User-agent: *
Disallow:

User-agent: Voiceflow
Disallow:
User-agent: *
Disallow:

User-agent: Voiceflow
Disallow:
In this example, User-agent: * applies to all web robots. The Disallow: line tells the robot that it can visit all areas of the site. The User-agent: Voiceflow line applies specifically to Voiceflow's crawler, and the Disallow: line tells it that it can visit all areas of the site. Please note that the Disallow: line can be replaced with specific paths to restrict access to certain areas of your site. For example, Disallow: /private/ would prevent the robot from accessing anything in the 'private' directory of your site. Remember to replace 'Voiceflow' with the actual name of the Voiceflow crawler/bot. As of now, Voiceflow has not publicly disclosed the name of its crawler/bot. If you're unsure, please reach out to the website administrator or developer for the correct entity name. Please note that changes to the robots.txt file can take some time to propagate and be recognized by web robots.
W. Williams (SFT)
You should also be able to set allow rules based on the block requests within your sites logs. We just had to add a few rules to allow VF access.
fascinating-indigo
fascinating-indigoOP17mo ago
Thanks. I'm not not sure I'll be able to get the name of the crawler/bot. Our city's website is hosted by a multi-tenant CMS provider, so we don't have access to every web hit to capture the name of the crawler.
W. Williams (SFT)
You could use your own crawler and then upload the data as a txt or pdf file.

Did you find this page helpful?