complex-teal
complex-teal15mo ago

Scary high token consumption

Hello I upgraded 1 hour ago (token consumption was 0 then) and did some tests to my (unpublished) chatbot. After a few tests it was 69k tokens, then 177k and then suddenly 261k I only used simple blocks with - text - capture user reply - response AI with knowledge base My model is GPT3.5 It kept loading for a while then said "Unable to fetch response. Now I tried it again with and had the same problem. It jumped from 262k to 362k tokens just from the attached workflow. This loop was working before, I am confused and also hate the fact that I consumed 100k in one minute of loading.
No description
11 Replies
W. Williams (SFT)
You have an LLM loop. Can you take a screenshot of your whole flow? not cut-off
complex-teal
complex-tealOP15mo ago
Hi, thank you for your quick response. I have now changed the flow by adding a button (and slightly rearranging it). If I remove this button, will this happen again? The thing is it was working in the past. What are typical pitfalls when contructing loops that might lead to high token consumption?
No description
complex-teal
complex-tealOP15mo ago
I realised I didn't capture the user reply in my first screenshot. (There was nothing cut off in the first screenshot by the way) Which means that {last_utterance} did not change. But why does this lead to huge token consumption? Do I have to add the button to slow the workflow down?
xenial-black
xenial-black15mo ago
it would loop for many times until it times out on our end
W. Williams (SFT)
Where is the “choice block?” Looks like u solved it Use a capture step to just capture last_utterance an put it into the AI Step
complex-teal
complex-tealOP15mo ago
Thanks a lot for your help. I still don't understand why it doesn't output anything EVEN if there is an infinite loop caused by the fact that I forgot the second capture step. Shouldn't it output the same AI response many times instead of just loading? Yes I solved it. Thank you very much for your help. I still don't really understand what was going on there. If there was a infinite loop then it should have outputted the same AI response, right? But it was just loading. Anyway - it shows that you can use up tokens extremley fast: 100k tokens in 1 minute if something goes wrong. This is definitely a warning to me.
W. Williams (SFT)
I've seen it hundreds of times
xenial-black
xenial-black15mo ago
I agree its a little confusing, its based on how our runtime resolves requests, it outputs things once all the inputs have been resolved
complex-teal
complex-tealOP15mo ago
Is there a way to cap token consumption? That would be very helpful for testing. If not I would kindly request that feature.
W. Williams (SFT)
You can keep a counter and limit requests.
complex-teal
complex-tealOP15mo ago
Thanks, great idea. I found a video that explains how to do that using the analytics api from voiceflow. To anyone reading this, this was the video: https://www.youtube.com/watch?v=Ynh3Wg7o0mI and this is the API doc: https://developer.voiceflow.com/reference/querypubliccontroller_queryusage There also appears to be a third-party solution doing that: https://landing.intelletokens.app/

Did you find this page helpful?