Technology

Built like infrastructure. Documented like it matters.

The technology behind Africa's Voice is not a black box. The pipeline, the validation system, the model training methodology, and the data governance layer are all documented to a level that survives an enterprise procurement audit.

The 8-step audio processing pipeline

Format normalisation

Every recording converted to 16kHz, 16-bit, mono WAV using ffmpeg. Standardisation at ingestion ensures downstream consistency.

Voice activity detection

Silero VAD identifies speech regions. Files with speech ratio below 0.35 auto-rejected; 0.35–0.45 routed to human review.

Deduplication

MFCC cosine similarity at 0.88 threshold flags suspected duplicates; 0.95+ auto-rejected. Prevents corpus contamination from repeated submissions.

Language identification

fastText LID-176 classifier. Confidence below 0.65 auto-rejected; 0.65–0.75 routed to human review with language flag.

Quality scoring

SNR spectral analysis produces 0–100 quality score. Below 40 auto-rejected; 40–70 human review; above 70 fast-tracked.

Metadata enrichment

Combines device metadata, audio metrics, and contributor demographics. Required fields enforced before downstream processing.

Anti-fraud scoring

Submission timing, device fingerprint, account linkage, and gold task accuracy combined into fraud score. High scores auto-suspend.

QA queue assignment

Routed to validator pool by language. Queue depth and SLA monitored continuously. Alert above 48-hour backlog.

Validation and quality assurance

Every recording reaching the QA queue is reviewed by a trained native-speaker validator using a purpose-built dashboard. Validators see waveform display, variable-speed audio playback, and a structured rejection taxonomy. Gold tasks — recordings with known correct decisions — are injected at position 7–10 in every validator session. Validators are not told which tasks are gold. Their accuracy is tracked silently and used to calibrate the validator pool. Validators below 75% gold task accuracy receive retraining; below 60% sustained, removal.

Model fine-tuning approach

Africa's Voice ASR models are fine-tuned from Whisper large-v3 as the base. Fine-tuning datasets are language-specific and domain-balanced. Each model release is benchmarked against a held-out evaluation set per language, with WER measured on standard test conditions and on Nigerian-specific test conditions (code-switching, market environment, multiple dialects). Benchmark results are published — including failure modes.

Data governance technical layer

Encryption

AES-256 at rest in S3. TLS 1.2+ in transit. Per-key encryption for sensitive metadata fields.

Audit log

PostgreSQL append-only audit log. Every event logged: upload, processing, QA decision, consent update, buyer access, deletion. Backed up to separate S3 bucket. Never updated.

Deletion mechanics

Contributor deletion requests honoured within 30 days. Audio files removed from active storage; audit log entry retained. Already-trained models are not retrained for deletions — covered in consent terms.

NDPA compliance

Registered with the Nigeria Data Protection Commission. DPIA completed and updated annually. Data Processing Agreements in place with all sub-processors. Data residency in af-south-1 (Cape Town) with replication policy documented.

Benchmarks

Benchmark methodology

WER is measured against a held-out evaluation set per language, audited by an independent academic partner before publication. We do not benchmark against our own training data. We do not benchmark against unrepresentative academic test sets. We benchmark against the conditions Nigerian users actually create — phone-quality audio, market background noise, code-switching utterances, multiple dialects per language. The benchmark report is published openly when first results are available.