VoicePersona
A voice that matches the ear it's speaking into
A speech layer that gives every call a chosen voice — any language we cover, warm or formal, clinical or conversational — with live A/B assignment that learns which voice each doctor responds to over time.
- All Languages AIM supports, several registers each
- Live A/B learning, per doctor
- 0 Voices shipped without clinical-term audit
VoicePersona is the service that decides, turn by turn, which voice speaks on the call. It hosts a bank of pre-auditioned voices across every supported language and several registers per language — neutral, softer, older, younger, with regional accents where they matter — and chooses which one to use for each outbound call.
The choice is not a random one. A new doctor gets a reasonable default based on language and region. Over time, the system A/B tests voices against the one thing that matters — pickup-to-registration conversion — and settles on whichever voice that individual doctor picks up for the most, listens to the longest and registers through. Two years into a programme, no two doctors necessarily hear the same voice.
Every voice is reviewed for cadence, pronunciation of medical terms and handling of pauses. This is not a hobbyist TTS — a cardiology webinar cannot have the word "ischaemic" mangled, and a call that stacks syllables without pauses sounds like a robot no matter how good the base model is. The platform maintains a vetted shortlist and retires voices that stop performing.
Everything VoicePersona handles for you
-
Multilingual bank
Every language AIM supports, each in several registers — formal, warm, clinical, concise.
-
Per-doctor A/B
System converges each doctor to the voice that maximises their own pickup-to-registration rate.
-
Medical-term hygiene
Pronunciation of clinical vocabulary reviewed per language; new terms flagged for audit.
-
Cadence preservation
Natural pauses, soft hand-offs and disfluencies kept intact — no uncanny, stacked-syllable delivery.
-
Quick persona swap
An operator can retire a voice or A/B a new one without touching the rest of the stack.
The voice layer of every outbound call
VoicePersona is a specialised surface — it renders speech into VoiceDialer, takes instructions from AgentBuilder, and is tuned per tenant by the platform admin.
- VoiceDialer plays the rendered audio into the live phone call.
- AgentBuilder drives what the voice says and when; persona renders, agent composes.
- SuperAdmin manages the persona bank, retires voices and controls which personas are active per tenant.
Wire VoicePersona into your product today
Book a consultation with our founders and we'll walk you through the whole microservice stack — not just this one — live on your domain.