gradual-turquoise
gradual-turquoise3mo ago

Embedded Widget: Next Message Displays Before Voice Output Completes

Hi Voiceflow Support, I'm experiencing an issue with the timing of text and voice output in the embedded widget version of our agent. In the Voiceflow prototype, the next message is held until the current voice output finishes playing. However, when we embed the agent widget on our site, the text for the next message appears immediately, before the voice output for the current message has completed. This causes the user to see text messages in rapid succession, while the audio lags behind, resulting in a disjointed experience. Ideally, we’d like the embedded widget to behave the same as the prototype: hold off on displaying the next message until the current voice output is fully finished. Is there a way to synchronize message display timing with voice output completion in the embedded widget, similar to the prototype experience? If not, is there a workaround or setting we might have missed? Let me know if you need any additional details or a demo of the issue.
1 Reply
Tico | Voiceflow Assistant
Hey there! Short version – the default Web Chat widget always renders the next text trace as soon as it’s returned by the runtime, while the audio for the previous trace is still buffering/playing. Right now there isn’t a built-in switch to “hold” the next message until the current TTS playback finishes (the Prototype tool does that only inside the test environment).
Why it happens
• The runtime advances to the next step immediately; TTS audio is generated client-side after a small buffer.
• Visually this keeps the chat fast, but it means text ≈ real-time state while audio lags a bit (see “slight delay between the agent’s state and what the user hears” in the docs).
What you can do today
1. Pace it in-flow – combine multiple lines into a single Message step or add a Message Delay/Pause (set in ms) before the following step so the audio has time to finish.
2. Custom front-end – if you need perfect sync you can build your own UI on top of the Dialog Manager API: fetch one turn, play the audio, wait for the HTMLAudioElement ended event, then call /state/interact for the next turn.
3. Turn voice off in chat – if the primary channel is web chat, consider disabling the Voice setting so users just read the text (no desync).
Limitations / no-go’s
• There’s no widget parameter (yet) that delays rendering until audioend.
• The message events the widget emits (voiceflow:interact, etc.) don’t include an “audio finished” signal you could hook into.
Docs for reference
• Delay between agent state & audio: https://docs.voiceflow.com/docs
• Message Delay option on Message Step: https://docs.voiceflow.com/docs/message-step#message-delay
Hope this clarifies what you’re seeing and the work-arounds available. Let me know if you’d like more detail on any of the approaches!

Did you find this page helpful?