Skip to content

Languages

Five languages. Equal rigour.

Every language we serve gets the same investment in collection methodology, validation rigour, model fine-tuning, and benchmark transparency. No language is an afterthought. No dialect is treated as a deviation from a standard.

Igbo

~30 million speakers

Igbo has the worst existing model performance of the five Nigerian languages we cover and the deepest dialect fragmentation. Owerri, Onitsha, Nsukka, Nnewi, and Ikwuano Igbo are mutually intelligible but acoustically distinct. We collect 380 validated hours across all major dialect regions.

Current baseline

Whisper large-v3: ~55% WER

Africa's Voice target

Africa's Voice target: <30% WER at v0

Domain coverage

Daily lifeFinancial servicesTelecomsCommerceHealthCode-switchingAgricultureEducationGovernmentNews

Geographic spread

Anambra, Imo, Enugu, Abia, Ebonyi, plus Igbo speakers in Delta and Rivers

Yoruba

~45 million speakers

Yoruba is a tonal language — the same phoneme sequence with different tones means different words. Existing commercial models do not preserve tone. We collect 360 validated hours with explicit tone-mark policy in transcription, drawing contributors from Lagos, Ibadan, and across the South-West.

Current baseline

Google Speech: 60%+ WER · Whisper: ~55%

Africa's Voice target

Africa's Voice target: <30% WER at v0

Domain coverage

Daily lifeFinancial servicesTelecomsCommerceCode-switchingHealthEducationAgricultureGovernmentNews

Geographic spread

Lagos, Oyo, Ogun, Osun, Ekiti, Ondo, plus Yoruba speakers in Kwara and Kogi

Hausa

~80 million speakers

Hausa is the largest indigenous African language by speaker count and the dominant lingua franca of the Sahel. We collect 300 validated hours from Kano, Kaduna, Katsina, Sokoto, and the broader North, with explicit attention to Islamic finance terminology and rural agricultural domains.

Current baseline

Google Speech: 55%+ WER · AWS: not supported

Africa's Voice target

Africa's Voice target: <28% WER at v0

Domain coverage

Daily lifeFinancial servicesCommerceTelecomsAgricultureHealthCode-switchingGovernment

Geographic spread

Kano, Kaduna, Katsina, Sokoto, Zamfara, Kebbi, Bauchi, Gombe

Nigerian Pidgin

~75 million speakers

Nigerian Pidgin is the true lingua franca of Nigeria — used by every social class, every region. It has no standard orthography. We adopt the Naijá orthographic convention developed at the University of Port Harcourt and apply it consistently across 240 validated hours.

Current baseline

Whisper: ~35% WER (best of the five but still commercially unusable)

Africa's Voice target

Africa's Voice target: <25% WER at v0

Domain coverage

Daily lifeCommerceFinancial servicesTelecomsHealthEntertainmentCode-switching

Geographic spread

Lagos, Rivers, Delta, Edo, Cross River, with Anambra-Pidgin variants

Nigerian English

~80 million speakers

Nigerian English is a distinct variety — not a deviation from British or American English. Existing models perform reasonably on educated Lagos English but fail on Northern Nigerian English, Middle Belt English, and accented variants. We collect 220 validated hours focused on the underrepresented variants and on code-switching where English is the matrix language.

Current baseline

Google Speech: ~10% WER on Lagos English; significantly higher on regional variants

Africa's Voice target

Africa's Voice target: <15% WER on regional variants

Domain coverage

Daily lifeFinancial servicesTelecomsCode-switchingGovernmentNews

Geographic spread

All 36 states with focus on FCT, Lagos, Plateau, Benue, Kogi, and the North-Central belt