Marlon•5mo ago

LLM Streaming Question

Hey all, I’m curious if Voiceflow will implement something in future like RETELL has, where the agent steps starts speaking as soon as it gets first tokens from LLM output, instead of waiting for the entire llm response before responding (what voiceflow is currently like) This would be a gamechanger for voice calls because of latency reduction

8 Replies

Tico | Voiceflow Assistant•5mo ago

Good news – you don’t need to wait for a full LLM turn anymore. Voiceflow already supports server-sent streaming so you can surface partial tokens as soon as they’re generated, then feed them straight into TTS for a “start-speaking-immediately” experience: 1. Use the Dialog Manager Interact Stream endpoint • POST /v2/project/{projectID}/user/{userID}/interact/stream • Set header Accept: text/event-stream (SSE) 2. Pass the query parameter completion_events=true
This breaks the AI Response trace into: • completion-start – first chunk of text • completion-continue – additional chunks • completion-end – final chunk 3. Each trace event arrives the moment those tokens are produced, so you can begin TTS playback (or send “play/continue” messages to your telephony provider) while the rest of the completion streams in, cutting perceived latency dramatically. Example (truncated):

event: trace      ⟶ completion-start  → “Welcome …”
event: trace      ⟶ completion-continue→ “how can I help… ”
event: trace      ⟶ completion-end     → “today?”

event: trace      ⟶ completion-start  → “Welcome …”
event: trace      ⟶ completion-continue→ “how can I help… ”
event: trace      ⟶ completion-end     → “today?”

Because the events are standard SSE, you can pipe them directly into any voice channel (Twilio, Retell, WebRTC, etc.) that supports progressive TTS. Docs & sample: • Interact Stream – https://docs.voiceflow.com/reference/interact-stream • completion_events details – https://docs.voiceflow.com/reference/stream-completion-events-1 • Working demo repo – https://github.com/voiceflow/streaming-wizard So the “RETELL-style” partial-token speaking is already possible today – just swap to the streaming endpoint and enable completion_events.

MarlonOP•5mo ago

@Tico | Voiceflow Assistant Is this possible to do with Agent Voiceflow steps though? Because on some steps I need the agent to talk to the user back and forth to figure something out and only then an exit path is triggered once a piece of info has been collected from the user Hey @Moderator

Braden•5mo ago

Hey @Marlon , voiceflow already does this you just need to use the agent step

MarlonOP•5mo ago

Oh, nice

KimLooo•5mo ago

Hey @Marlon let's also refrain from spam tagging the team as it's against server rules! 🙏

MarlonOP•5mo ago

Ok @KimLooo , what is the best way to get support though, here or is email better? Thanks

Braden•5mo ago

this was fine, it's just the tag - don't tag the mods @Marlon

MarlonOP•5mo ago

Oh gotcha, thank you

LLM Streaming Question

Did you find this page helpful?