bardy
bardy2mo ago

Massive Speech-to-Text Accuracy Issues over Twilio Calls (German / 8 kHz Narrowband)

Hey everyone, we’re running into a major issue with STT accuracy on Twilio inbound phone calls (German language). Our setup: Twilio PSTN number (8 kHz / G.711) → Voiceflow call flow Deepgram Nova-2 (DE) as STT model TTS: ElevenLabs LLM + Voiceflow Agent for routing (Rezept / Rückruf / Termin) The problem: the bot often completely misunderstands simple words like “Rezept” or “Rückruf”. It performs perfectly fine in the browser (wideband mic), but fails badly when going through Twilio. Example: I clearly say “Ich brauche ein Rezept” → STT returns “Ich komm” or random nonsense. Already tested: ✅ Deepgram tier: enhanced, smart_format: true, no_delay: false, language: de ✅ AGC active ✅ Google STT V2 as alternative (same issue) ✅ Increased “End of Speech Timeout” to 2000 ms ✅ Clean mic / quiet environment Still, accuracy stays around 50 %. I know Twilio’s PSTN audio is 8 kHz narrowband, but is there any known workaround or optimization to improve STT quality? Questions: Has anyone found a codec / routing setup (e.g., Opus, PCM16) that actually delivers 16 kHz audio into Voiceflow? Are there STT models better optimized for telephony audio (8 kHz)? Would using a Twilio Stream Connector with preprocessing (AGC + noise suppression + echo cancel) help? Any trick to reduce false triggers or cut-offs (VAD too aggressive)? Would love to hear if anyone faced this and found a robust solution 🙏 Thanks in advance! — Matthias
0 Replies
No replies yetBe the first to reply to this messageJoin

Did you find this page helpful?