exotic-emerald
exotic-emerald15mo ago

AI not responding in complete sentences ?

I’m having a weird issue where I’m getting partial responses. So it’s just like the message is cut off at a random point, mid word, and I have no idea why. This isn’t an issue I’ve had in the past. It seems like it only happens in the testing environment and not in the published / prototype thing. So it’s not a massive issue but it’s been happening a bunch over the last two days and it’s confusing
No description
7 Replies
Tico | Voiceflow Assistant
When designing your AI assistant on Voiceflow, there are several factors to consider to ensure it works as intended: 1. Token Limit: Tokens are used by both the input and output of the AI model. The maximum number of tokens available per response is 512, including your prompt and settings. If your response exceeds this limit, it will be cut off. 2. Temperature Setting: The temperature setting influences the variability of your responses. A higher temperature results in more variability, while a lower temperature results in more exact responses. 3. System Instructions: System instructions are given to the AI model to frame how it should behave. Make sure these instructions do not contradict with the question or prompt. 4. Memory Settings: If you are using memory in your prompts, ensure that the previous conversation turns are not too lengthy, as this might consume more tokens and lead to truncated responses. 5. AI Model: Different AI models have different costs and token multipliers. More accurate models use more tokens. For example, using GPT-4 will use 25x more Voiceflow AI tokens than GPT-3.5 for the same question. Here are some relevant links for further details: - Voiceflow Step 2: Create Assistant - Token Calculation and Limits Remember to run a prototype to make sure your assistant works as intended. If it isn't working the way you expect, you can refer to the debugging guide.
stormy-gold
stormy-gold15mo ago
Try increasing your max tokens in your response block
No description
exotic-emerald
exotic-emeraldOP15mo ago
Yes but it still doesn’t make sense - should it not be formatting the response to be contained within the token limit ? Otherwise it would be making unnecessarily long responses ? That’s my thinking at least. Like if it I have tokens at 100, it should be formatting a response within those parameters as opposed to writing a 200 token response and only sending half of it
stormy-gold
stormy-gold15mo ago
my understanding is max tokens is the max limit that can be used, not the fixed amount that will be used when generating a response. I faced the same issue before and to solve it I included a prompt to be as brief as possible / included a max word limit and in some cases where required used a different model with a higher max token limit
W. Williams (SFT)
correct. the llm does not use the maxToken value as instructions. you need to prompt it to answer as briefly as possible, in a few sentences, etc.
exotic-emerald
exotic-emeraldOP15mo ago
Thanks guys ! I’ll mess around with this stuff and see if I can solve this. Appreciate the help!
other-emerald
other-emerald15mo ago
correct @Saxon - we can try to make this way more intuitive

Did you find this page helpful?