complex-teal•15mo ago
Scary high token consumption
Hello I upgraded 1 hour ago (token consumption was 0 then) and did some tests to my (unpublished) chatbot.
After a few tests it was 69k tokens, then 177k and then suddenly 261k
I only used simple blocks with
- text
- capture user reply
- response AI with knowledge base
My model is GPT3.5
It kept loading for a while then said "Unable to fetch response.
Now I tried it again with and had the same problem.
It jumped from 262k to 362k tokens just from the attached workflow.
This loop was working before, I am confused and also hate the fact that I consumed 100k in one minute of loading.

11 Replies
You have an LLM loop. Can you take a screenshot of your whole flow?
not cut-off
complex-tealOP•15mo ago
Hi, thank you for your quick response. I have now changed the flow by adding a button (and slightly rearranging it).
If I remove this button, will this happen again? The thing is it was working in the past. What are typical pitfalls when contructing loops that might lead to high token consumption?

complex-tealOP•15mo ago
I realised I didn't capture the user reply in my first screenshot. (There was nothing cut off in the first screenshot by the way)
Which means that {last_utterance} did not change.
But why does this lead to huge token consumption?
Do I have to add the button to slow the workflow down?
xenial-black•15mo ago
it would loop for many times until it times out on our end
Where is the “choice block?”
Looks like u solved it
Use a capture step to just capture last_utterance an put it into the AI Step
complex-tealOP•15mo ago
Thanks a lot for your help. I still don't understand why it doesn't output anything EVEN if there is an infinite loop caused by the fact that I forgot the second capture step.
Shouldn't it output the same AI response many times instead of just loading?
Yes I solved it. Thank you very much for your help. I still don't really understand what was going on there. If there was a infinite loop then it should have outputted the same AI response, right? But it was just loading.
Anyway - it shows that you can use up tokens extremley fast: 100k tokens in 1 minute if something goes wrong. This is definitely a warning to me.
I've seen it hundreds of times
xenial-black•15mo ago
I agree its a little confusing, its based on how our runtime resolves requests, it outputs things once all the inputs have been resolved
complex-tealOP•15mo ago
Is there a way to cap token consumption? That would be very helpful for testing. If not I would kindly
request that feature.
You can keep a counter and limit requests.
complex-tealOP•15mo ago
Thanks, great idea.
I found a video that explains how to do that using the analytics api from voiceflow. To anyone reading this, this was the video: https://www.youtube.com/watch?v=Ynh3Wg7o0mI
and this is the API doc: https://developer.voiceflow.com/reference/querypubliccontroller_queryusage
There also appears to be a third-party solution doing that: https://landing.intelletokens.app/