Brandersnatcher
Brandersnatcher17mo ago

Any suggestion for getting around a [Moderation error]?

The company I work at deals with video, and sometimes something NSFW will get past our moderation. If a user were to write something along the lines of "There's p**n", I'd like the bot to respond with one of our NSFW policies (which I have in my knowledge base). However, I'm being stopped by a [Moderation error]. Is this implemented by OpenAI or is this a VoiceFlow limitation? In either case, I suppose I can hardcode something, but that kinda defeats the purpose of having an ai bot 😅
No description
4 Replies
Tico | Voiceflow Assistant
The [Moderation error] you're encountering is a feature implemented by Voiceflow, not OpenAI. It's designed to prevent the AI from generating inappropriate content. However, it can sometimes interfere with legitimate use cases, such as yours. To work around this, you could consider a few options: 1. Hardcoding: As you mentioned, you could hardcode a response for specific phrases or words. This would bypass the AI's moderation system, but it might not be the most scalable solution, especially if there are many different ways users could phrase their reports of inappropriate content. 2. Custom Moderation System: You could implement a custom moderation system that checks user inputs for specific phrases or words before they're sent to the AI. If a match is found, you could then trigger a specific response. This would be more flexible than hardcoding, but it would require additional development work. 3. Adjusting AI Settings: In Voiceflow, you can adjust the AI's settings to change how it generates responses. For example, you could lower the 'temperature' setting to make the AI's responses more focused and less likely to generate inappropriate content. However, this wouldn't directly address the issue of the AI's moderation system blocking certain inputs. Remember, the Generative Global No Match feature in Voiceflow is still experimental and may not be suitable for all use cases, especially those involving sensitive content. It's recommended to use it with caution in a production environment. For more information on how to adjust the AI's settings in Voiceflow, you can refer to the Global No Match (Generative or Static) documentation. For details on how tokens are calculated and used in Voiceflow, you can refer to the What are Tokens? documentation.
Brandersnatcher
BrandersnatcherOP17mo ago
Thanks, this is going through an intent not a global no match. Appreciate the quick response!
NiKo | Voiceflow
NiKo | Voiceflow17mo ago
You can use a pre-pass using Claude Haiku for example. Here is one of the prompt they're sharing and that you can tweak to fits your needs.
You are a content moderation expert tasked with categorizing user-generated text based on the following guidelines:

BLOCK CATEGORY:
Content that is not related to rollercoasters, theme parks, or the amusement industry
Explicit violence, hate speech, or illegal activities
Spam, advertisements, or self-promotion

ALLOW CATEGORY:
Discussions about rollercoaster designs, ride experiences, and park reviews
Sharing news, rumors, or updates about new rollercoaster projects
Respectful debates about the best rollercoasters, parks, or ride manufacturers
Some mild profanity or crude language, as long as it is not directed at individuals

Here are some examples:
<examples>
Text: I'm selling weight loss products, check my link to buy!
Category: BLOCK

Text: I hate my local park, the operations and customer service are terrible. I wish that place would just burn down.
Category: BLOCK

Text: Did anyone ride the new RMC raptor Trek Plummet 2 yet? I've heard it's insane!
Category: ALLOW

Text: Hercs > B&Ms. That's just facts, no cap! Arrow > Intamin for classic woodies too.
Category: ALLOW
</examples>

Given those examples, here is the user-generated text to categorize:
<user_text>{user_text}</user_text>

Based on the guidelines above, classify this text as either ALLOW or BLOCK. Return nothing else.
You are a content moderation expert tasked with categorizing user-generated text based on the following guidelines:

BLOCK CATEGORY:
Content that is not related to rollercoasters, theme parks, or the amusement industry
Explicit violence, hate speech, or illegal activities
Spam, advertisements, or self-promotion

ALLOW CATEGORY:
Discussions about rollercoaster designs, ride experiences, and park reviews
Sharing news, rumors, or updates about new rollercoaster projects
Respectful debates about the best rollercoasters, parks, or ride manufacturers
Some mild profanity or crude language, as long as it is not directed at individuals

Here are some examples:
<examples>
Text: I'm selling weight loss products, check my link to buy!
Category: BLOCK

Text: I hate my local park, the operations and customer service are terrible. I wish that place would just burn down.
Category: BLOCK

Text: Did anyone ride the new RMC raptor Trek Plummet 2 yet? I've heard it's insane!
Category: ALLOW

Text: Hercs > B&Ms. That's just facts, no cap! Arrow > Intamin for classic woodies too.
Category: ALLOW
</examples>

Given those examples, here is the user-generated text to categorize:
<user_text>{user_text}</user_text>

Based on the guidelines above, classify this text as either ALLOW or BLOCK. Return nothing else.
Another option is to use a dedicated model like Llama Guard
Brandersnatcher
BrandersnatcherOP17mo ago
interesting, thanks for this

Did you find this page helpful?