Languages & voices

JARAI produces content in many languages and a deep catalogue of voices. Administrators curate the language list, the voice variants available to operators, and the accent taxonomy that makes the voice picker expressive. These surfaces live under Settings.

Manage languages & voices

Prefer to read? Open the step-by-step transcript

Settings → Languages — the catalogue, each with an LLM support level and a TTS support level.
Settings → Voice variants — every available voice, filterable by trait chips (accent, age, gender, style, use-case…).
The accent taxonomy gives voices a precise regional label (e.g. es-ES-x-andalucia-sevilla).
The voice harvester discovers new provider voices and proposes them for the catalogue.

Languages

Settings → Languages lists supported languages, each carrying:

an LLM support level — how well text generation performs in that language;
a TTS support level — whether (and how well) voiceover is available.

Operators pick a language per production; the support levels set expectations and drive which downstream providers are eligible.

Voice variants

Settings → Voice variants is the catalogue of voices operators choose from when configuring an avatar. Each variant carries provider metadata that’s parsed into trait facets — accent, age, gender, style, use cases, tags, and voice colour. The same chip-multiselect filter appears here and in the avatar voice picker, so a selection drills to the same voices in both places.

The accent taxonomy

Accent is a deliberate competitive strength. Voices are tagged with a BCP-47 code plus an optional private-use extension for sub-regional accents — for example:

es-ES — Castilian Spanish (country level)
es-ES-x-andalucia — Andalusian (regional)
es-ES-x-andalucia-sevilla — Sevillano (sub-province)
en-GB-x-yorkshire, pt-BR-x-carioca, ar-x-egyptian, de-DE-x-bavarian …

Tiers: 1 = country/locale (broadly available), 2 = strong regional, 3 = aspirational sub-regional (often “no voice yet — request one” until a provider offers it). The taxonomy is the source of truth shared by the Console’s cascading accent picker.

Voice harvester

The voice harvester (Settings → Voice harvester: sources / runs / library) periodically discovers new voices from configured provider sources, captures samples, and proposes candidates for the catalogue. A licence-classification step reads each voice’s licence text and assigns a licence class, which supports auto-approval of clearly-permissive voices and flags restrictive ones for review.