fascinating-indigo•17mo ago
Permission for Voiceflow crawler/bot?
When I try to use a URL as a source, I get the error message the site needs access. Our website requires explicit permissions to allow crawlers.
What should I use as the entity of the Voiceflow crawler/bot?
Thanks!
4 Replies
To allow the Voiceflow crawler/bot to access your website, you'll need to modify your website's
robots.txt
file. This file is used to instruct web robots (like Voiceflow's crawler) about which areas of the website should not be processed or scanned.
Here's a basic example of what you can add to your robots.txt
file to allow Voiceflow's crawler:
In this example, User-agent: *
applies to all web robots. The Disallow:
line tells the robot that it can visit all areas of the site. The User-agent: Voiceflow
line applies specifically to Voiceflow's crawler, and the Disallow:
line tells it that it can visit all areas of the site.
Please note that the Disallow:
line can be replaced with specific paths to restrict access to certain areas of your site. For example, Disallow: /private/
would prevent the robot from accessing anything in the 'private' directory of your site.
Remember to replace 'Voiceflow' with the actual name of the Voiceflow crawler/bot. As of now, Voiceflow has not publicly disclosed the name of its crawler/bot. If you're unsure, please reach out to the website administrator or developer for the correct entity name.
Please note that changes to the robots.txt
file can take some time to propagate and be recognized by web robots.You should also be able to set allow rules based on the block requests within your sites logs. We just had to add a few rules to allow VF access.
fascinating-indigoOP•17mo ago
Thanks. I'm not not sure I'll be able to get the name of the crawler/bot. Our city's website is hosted by a multi-tenant CMS provider, so we don't have access to every web hit to capture the name of the crawler.
You could use your own crawler and then upload the data as a txt or pdf file.