Technology
Built like infrastructure. Documented like it matters.
The technology behind Africa's Voice is not a black box. The pipeline, the validation system, the model training methodology, and the data governance layer are all documented to a level that survives an enterprise procurement audit.
The 8-step audio processing pipeline
Format normalisation
Every recording converted to 16kHz, 16-bit, mono WAV using ffmpeg. Standardisation at ingestion ensures downstream consistency.
Voice activity detection
Silero VAD identifies speech regions. Files with speech ratio below 0.35 auto-rejected; 0.35–0.45 routed to human review.
Deduplication
MFCC cosine similarity at 0.88 threshold flags suspected duplicates; 0.95+ auto-rejected. Prevents corpus contamination from repeated submissions.
Language identification
fastText LID-176 classifier. Confidence below 0.65 auto-rejected; 0.65–0.75 routed to human review with language flag.
Quality scoring
SNR spectral analysis produces 0–100 quality score. Below 40 auto-rejected; 40–70 human review; above 70 fast-tracked.
Metadata enrichment
Combines device metadata, audio metrics, and contributor demographics. Required fields enforced before downstream processing.
Anti-fraud scoring
Submission timing, device fingerprint, account linkage, and gold task accuracy combined into fraud score. High scores auto-suspend.
QA queue assignment
Routed to validator pool by language. Queue depth and SLA monitored continuously. Alert above 48-hour backlog.
Validation and quality assurance
Every recording reaching the QA queue is reviewed by a trained native-speaker validator using a purpose-built dashboard. Validators see waveform display, variable-speed audio playback, and a structured rejection taxonomy. Gold tasks — recordings with known correct decisions — are injected at position 7–10 in every validator session. Validators are not told which tasks are gold. Their accuracy is tracked silently and used to calibrate the validator pool. Validators below 75% gold task accuracy receive retraining; below 60% sustained, removal.
Model fine-tuning approach
Africa's Voice ASR models are fine-tuned from Whisper large-v3 as the base. Fine-tuning datasets are language-specific and domain-balanced. Each model release is benchmarked against a held-out evaluation set per language, with WER measured on standard test conditions and on Nigerian-specific test conditions (code-switching, market environment, multiple dialects). Benchmark results are published — including failure modes.
Data governance technical layer
Encryption
AES-256 at rest in S3. TLS 1.2+ in transit. Per-key encryption for sensitive metadata fields.
Audit log
PostgreSQL append-only audit log. Every event logged: upload, processing, QA decision, consent update, buyer access, deletion. Backed up to separate S3 bucket. Never updated.
Deletion mechanics
Contributor deletion requests honoured within 30 days. Audio files removed from active storage; audit log entry retained. Already-trained models are not retrained for deletions — covered in consent terms.
NDPA compliance
Registered with the Nigeria Data Protection Commission. DPIA completed and updated annually. Data Processing Agreements in place with all sub-processors. Data residency in af-south-1 (Cape Town) with replication policy documented.
Benchmarks
Benchmark methodology
WER is measured against a held-out evaluation set per language, audited by an independent academic partner before publication. We do not benchmark against our own training data. We do not benchmark against unrepresentative academic test sets. We benchmark against the conditions Nigerian users actually create — phone-quality audio, market background noise, code-switching utterances, multiple dialects per language. The benchmark report is published openly when first results are available.