Massive Speech-to-Text Accuracy Issues over Twilio Calls (German / 8 kHz Narrowband)
Hey everyone,
we’re running into a major issue with STT accuracy on Twilio inbound phone calls (German language).
Our setup:
Twilio PSTN number (8 kHz / G.711) → Voiceflow call flow
Deepgram Nova-2 (DE) as STT model
TTS: ElevenLabs
LLM + Voiceflow Agent for routing (Rezept / Rückruf / Termin)
The problem: the bot often completely misunderstands simple words like “Rezept” or “Rückruf”.
It performs perfectly fine in the browser (wideband mic), but fails badly when going through Twilio.
Example: I clearly say “Ich brauche ein Rezept” → STT returns “Ich komm” or random nonsense.
Already tested:
✅ Deepgram tier: enhanced, smart_format: true, no_delay: false, language: de
✅ AGC active
✅ Google STT V2 as alternative (same issue)
✅ Increased “End of Speech Timeout” to 2000 ms
✅ Clean mic / quiet environment
Still, accuracy stays around 50 %.
I know Twilio’s PSTN audio is 8 kHz narrowband, but is there any known workaround or optimization to improve STT quality?
Questions:
Has anyone found a codec / routing setup (e.g., Opus, PCM16) that actually delivers 16 kHz audio into Voiceflow?
Are there STT models better optimized for telephony audio (8 kHz)?
Would using a Twilio Stream Connector with preprocessing (AGC + noise suppression + echo cancel) help?
Any trick to reduce false triggers or cut-offs (VAD too aggressive)?
Would love to hear if anyone faced this and found a robust solution 🙏
Thanks in advance!
— Matthias
0 Replies