rudisp
rudisp3mo ago

google speech to text

How can I integrate googles speech to text model into voiceflow?
4 Replies
Tico | Voiceflow Assistant
Voiceflow doesn’t yet offer a built-in connector for Google Cloud Speech-to-Text. Today the Telephony channel lets you pick from the supported STT engines below, and Google isn’t on the list: • Cartesia Ink-Whisper (Whisper-based) • AssemblyAI Universal • Deepgram (Nova-2, Nova-3, Nova-3 Medical) If you want to use Google’s model you would need to run it outside of Voiceflow – e.g., forward the raw audio stream from Twilio (or another carrier) to your own middleware, call Google’s API there, then pass the transcript back to Voiceflow through an API/Function step. That custom proxy is currently the only workaround until Google STT is added to the native provider list. Docs: https://docs.voiceflow.com
Voiceflow
Welcome to Voiceflow's docs
Build, manage, and deliver chat and voice agents for customer support and beyond.
rudisp
rudispOP2mo ago
ok, how can I forward the audio stream from voiceflow to googles stt and get the transcript back in voiceflow? Can someone answer the previous question? How can I forward an audio stream from voiceflow to googles text to speech and get it back?
Francois🇫🇷
I'm not familiar with the Google voice service, but a way to achieve what you want would be to create an automation on Make where you fetch the audio file, you create a transcription with Whisper (OpenAI) and send it to Voiceflow via the API. If you need we can have a call
rudisp
rudispOP2mo ago
Hey @Francois🇫🇷 thanks for the reply! For my use case, the available speech models didn't transcribe the speech correctly, and the one from Google seemed like the most likely to succeed. I got it sort of working with twiml custom action step and a cloud run job that does the processing with googles STT. Then on Voiceflow side I just try to poll the cloud run job till it returns a success state and the transcribed text. I tried the Voiceflow api route, but couldn't get it to work. If you send something back to voiceflow project via the voiceflow api, how/where does it get captured?

Did you find this page helpful?