Voiceflow doesn’t yet offer a built-in connector for Google Cloud Speech-to-Text.
Today the Telephony channel lets you pick from the supported STT engines below, and Google isn’t on the list: • Cartesia Ink-Whisper (Whisper-based) • AssemblyAI Universal • Deepgram (Nova-2, Nova-3, Nova-3 Medical)
If you want to use Google’s model you would need to run it outside of Voiceflow – e.g., forward the raw audio stream from Twilio (or another carrier) to your own middleware, call Google’s API there, then pass the transcript back to Voiceflow through an API/Function step. That custom proxy is currently the only workaround until Google STT is added to the native provider list.
I'm not familiar with the Google voice service, but a way to achieve what you want would be to create an automation on Make where you fetch the audio file, you create a transcription with Whisper (OpenAI) and send it to Voiceflow via the API. If you need we can have a call
Hey @Francois thanks for the reply! For my use case, the available speech models didn't transcribe the speech correctly, and the one from Google seemed like the most likely to succeed. I got it sort of working with twiml custom action step and a cloud run job that does the processing with googles STT. Then on Voiceflow side I just try to poll the cloud run job till it returns a success state and the transcribed text. I tried the Voiceflow api route, but couldn't get it to work. If you send something back to voiceflow project via the voiceflow api, how/where does it get captured?