Brandersnatcher•2y ago

Any suggestion for getting around a [Moderation error]?

The company I work at deals with video, and sometimes something NSFW will get past our moderation. If a user were to write something along the lines of "There's p**n", I'd like the bot to respond with one of our NSFW policies (which I have in my knowledge base). However, I'm being stopped by a [Moderation error]. Is this implemented by OpenAI or is this a VoiceFlow limitation? In either case, I suppose I can hardcode something, but that kinda defeats the purpose of having an ai bot 😅

4 Replies

Tico | Voiceflow Assistant•2y ago

The [Moderation error] you're encountering is a feature implemented by Voiceflow, not OpenAI. It's designed to prevent the AI from generating inappropriate content. However, it can sometimes interfere with legitimate use cases, such as yours. To work around this, you could consider a few options: 1. Hardcoding: As you mentioned, you could hardcode a response for specific phrases or words. This would bypass the AI's moderation system, but it might not be the most scalable solution, especially if there are many different ways users could phrase their reports of inappropriate content. 2. Custom Moderation System: You could implement a custom moderation system that checks user inputs for specific phrases or words before they're sent to the AI. If a match is found, you could then trigger a specific response. This would be more flexible than hardcoding, but it would require additional development work. 3. Adjusting AI Settings: In Voiceflow, you can adjust the AI's settings to change how it generates responses. For example, you could lower the 'temperature' setting to make the AI's responses more focused and less likely to generate inappropriate content. However, this wouldn't directly address the issue of the AI's moderation system blocking certain inputs. Remember, the Generative Global No Match feature in Voiceflow is still experimental and may not be suitable for all use cases, especially those involving sensitive content. It's recommended to use it with caution in a production environment. For more information on how to adjust the AI's settings in Voiceflow, you can refer to the Global No Match (Generative or Static) documentation. For details on how tokens are calculated and used in Voiceflow, you can refer to the What are Tokens? documentation.

BrandersnatcherOP•2y ago

Thanks, this is going through an intent not a global no match. Appreciate the quick response!

NiKo | Voiceflow•2y ago

You can use a pre-pass using Claude Haiku for example. Here is one of the prompt they're sharing and that you can tweak to fits your needs.

You are a content moderation expert tasked with categorizing user-generated text based on the following guidelines:

BLOCK CATEGORY:
 Content that is not related to rollercoasters, theme parks, or the amusement industry
 Explicit violence, hate speech, or illegal activities
 Spam, advertisements, or self-promotion

ALLOW CATEGORY:
 Discussions about rollercoaster designs, ride experiences, and park reviews
 Sharing news, rumors, or updates about new rollercoaster projects
 Respectful debates about the best rollercoasters, parks, or ride manufacturers
 Some mild profanity or crude language, as long as it is not directed at individuals

Here are some examples:
<examples>
Text: I'm selling weight loss products, check my link to buy!
Category: BLOCK

Text: I hate my local park, the operations and customer service are terrible. I wish that place would just burn down.
Category: BLOCK

Text: Did anyone ride the new RMC raptor Trek Plummet 2 yet? I've heard it's insane!
Category: ALLOW

Text: Hercs > B&Ms. That's just facts, no cap! Arrow > Intamin for classic woodies too.
Category: ALLOW
</examples>

Given those examples, here is the user-generated text to categorize:
<user_text>{user_text}</user_text>

Based on the guidelines above, classify this text as either ALLOW or BLOCK. Return nothing else.

You are a content moderation expert tasked with categorizing user-generated text based on the following guidelines:

BLOCK CATEGORY:
 Content that is not related to rollercoasters, theme parks, or the amusement industry
 Explicit violence, hate speech, or illegal activities
 Spam, advertisements, or self-promotion

ALLOW CATEGORY:
 Discussions about rollercoaster designs, ride experiences, and park reviews
 Sharing news, rumors, or updates about new rollercoaster projects
 Respectful debates about the best rollercoasters, parks, or ride manufacturers
 Some mild profanity or crude language, as long as it is not directed at individuals

Here are some examples:
<examples>
Text: I'm selling weight loss products, check my link to buy!
Category: BLOCK

Text: I hate my local park, the operations and customer service are terrible. I wish that place would just burn down.
Category: BLOCK

Text: Did anyone ride the new RMC raptor Trek Plummet 2 yet? I've heard it's insane!
Category: ALLOW

Text: Hercs > B&Ms. That's just facts, no cap! Arrow > Intamin for classic woodies too.
Category: ALLOW
</examples>

Given those examples, here is the user-generated text to categorize:
<user_text>{user_text}</user_text>

Based on the guidelines above, classify this text as either ALLOW or BLOCK. Return nothing else.

Another option is to use a dedicated model like Llama Guard

BrandersnatcherOP•2y ago

interesting, thanks for this

Any suggestion for getting around a [Moderation error]?

Did you find this page helpful?