Architecture · SmithVox · multichannel multilingual translation
One room, one ingress, one mix per participant — in their own language and in the original speaker's voice
SMITHVOX BRAIN STACK
RAW MIC IN
PER-LANG MIX OUT
AUDIO + SPK ID
SYNTHESIZED
TEXT + LANG
TGT + VOICE ID
LOOKUP TIMBRE
STATE SYNC
HASH-LOG TURN
USER
Speakers · N
EN · ES · MN · DE …
HUB
LiveKit SFU
per-participant tracks
USER
Listeners · N
each in chosen lang
STT
Diarization + STT
Deepgram Nova-3
BRAIN
Translation Hub
GPT-4.1 + glossary RAG
TTS
TTS · Multilingual
ElevenLabs MML v2 · clone
STORE
Voice Clone Vault
instant clone · session-scoped
AGENT
Session Chair
decisions · names · jargon
LEDGER
Compliance Ledger
hash chain · OpenTimestamps
One ingress.
N egress mixes.
each listener hears
the room in their
language · in the
speaker's own voice
Per-turn latency
STT ~250 ms
translate ~380 ms
TTS ~280 ms
jitter ~180 ms
p50 ≈ 1.09 s
p95 ≈ 1.28 s
LEGEND
Focal · brain & ledger
Internal service
Store / vault
External provider
Optional / parallel agent
Audio · WebRTC
State sync
Notarized handoff