LLM Streaming Question
Hey all,
I’m curious if Voiceflow will implement something in future like RETELL has, where the agent steps starts speaking as soon as it gets first tokens from LLM output, instead of waiting for the entire llm response before responding (what voiceflow is currently like)
This would be a gamechanger for voice calls because of latency reduction
8 Replies
Good news – you don’t need to wait for a full LLM turn anymore.
Voiceflow already supports server-sent streaming so you can surface partial tokens as soon as they’re generated, then feed them straight into TTS for a “start-speaking-immediately” experience:
1. Use the Dialog Manager Interact Stream endpoint
• POST /v2/project/{projectID}/user/{userID}/interact/stream
• Set header Accept: text/event-stream (SSE)
2. Pass the query parameter completion_events=true
This breaks the AI Response trace into: • completion-start – first chunk of text • completion-continue – additional chunks • completion-end – final chunk 3. Each trace event arrives the moment those tokens are produced, so you can begin TTS playback (or send “play/continue” messages to your telephony provider) while the rest of the completion streams in, cutting perceived latency dramatically. Example (truncated): Because the events are standard SSE, you can pipe them directly into any voice channel (Twilio, Retell, WebRTC, etc.) that supports progressive TTS. Docs & sample: • Interact Stream – https://docs.voiceflow.com/reference/interact-stream • completion_events details – https://docs.voiceflow.com/reference/stream-completion-events-1 • Working demo repo – https://github.com/voiceflow/streaming-wizard So the “RETELL-style” partial-token speaking is already possible today – just swap to the streaming endpoint and enable completion_events.
This breaks the AI Response trace into: • completion-start – first chunk of text • completion-continue – additional chunks • completion-end – final chunk 3. Each trace event arrives the moment those tokens are produced, so you can begin TTS playback (or send “play/continue” messages to your telephony provider) while the rest of the completion streams in, cutting perceived latency dramatically. Example (truncated): Because the events are standard SSE, you can pipe them directly into any voice channel (Twilio, Retell, WebRTC, etc.) that supports progressive TTS. Docs & sample: • Interact Stream – https://docs.voiceflow.com/reference/interact-stream • completion_events details – https://docs.voiceflow.com/reference/stream-completion-events-1 • Working demo repo – https://github.com/voiceflow/streaming-wizard So the “RETELL-style” partial-token speaking is already possible today – just swap to the streaming endpoint and enable completion_events.
@Tico | Voiceflow Assistant Is this possible to do with Agent Voiceflow steps though?
Because on some steps I need the agent to talk to the user back and forth to figure something out and only then an exit path is triggered once a piece of info has been collected from the user
Hey @Moderator
Hey @Marlon , voiceflow already does this
you just need to use the agent step
Oh, nice
Hey @Marlon let's also refrain from spam tagging the team as it's against server rules! 🙏
Ok @KimLooo , what is the best way to get support though, here or is email better? Thanks
this was fine, it's just the tag - don't tag the mods @Marlon
Oh gotcha, thank you