Architecture · SmithVox · multichannel multilingual translation

One room, one ingress, one mix per participant — in their own language and in the original speaker's voice

SMITHVOX BRAIN STACK RAW MIC IN PER-LANG MIX OUT AUDIO + SPK ID SYNTHESIZED TEXT + LANG TGT + VOICE ID LOOKUP TIMBRE STATE SYNC HASH-LOG TURN USER Speakers · N EN · ES · MN · DE … HUB LiveKit SFU per-participant tracks USER Listeners · N each in chosen lang STT Diarization + STT Deepgram Nova-3 BRAIN Translation Hub GPT-4.1 + glossary RAG TTS TTS · Multilingual ElevenLabs MML v2 · clone STORE Voice Clone Vault instant clone · session-scoped AGENT Session Chair decisions · names · jargon LEDGER Compliance Ledger hash chain · OpenTimestamps One ingress. N egress mixes. each listener hears the room in their language · in the speaker's own voice Per-turn latency STT ~250 ms translate ~380 ms TTS ~280 ms jitter ~180 ms p50 ≈ 1.09 s p95 ≈ 1.28 s LEGEND Focal · brain & ledger Internal service Store / vault External provider Optional / parallel agent Audio · WebRTC State sync Notarized handoff