Now Live: Afristereo LLM Bias Leaderboard

Ground Your AI in
African Reality

We Provide Large Scale LLM testing and High Quality Dataset that ensure your models reflect African realities while empowering local Communities.

collaborating with

Urban tech hub participants
Urban Hubs
Tech-literate annotators & experts
Rural community data collection
Remote Access
Deep community-led collection

Why Kitala?

Standard crowdsourcing fails in Africa. We combine deep offline reach with academic rigor to provide ground truth data.

Offline & Online Communities

We don't just access internet users. We deploy physical surveyors to capture data from remote, offline communities, ensuring a truly representative dataset.

Experts & Precise Demographics

Beyond basic labeling. We recruit specific demographic profiles and deploy domain experts (doctors, bankers) to validate high-stakes data.

Focused on African Culture & Social Norms

Our methodology is designed to capture nuance. We align models with local traditions, social norms, and linguistic subtleties that outsiders miss.

Our Services

Tailored for Generative AI evaluation and high-stakes sectors. Our expert-led services ensure your AI speaks the local language—culturally and linguistically.

Stereotype & Bias Evaluation

Auditing LLM to detect locally grounded and harmful bias across African demographics and ethnic groups. We go beyond standard benchmarks.

  • Local Stereotype Detection
  • Political Sensitivity Checks
  • Cultural Norm Verification
Auditing Dashboard
Core Service

Live LLM Testing (RLHF)

Real-time human feedback (Text or Voice) to rate model responses for safety, helpfulness, and cultural accuracy in live scenarios.

Text
Fine-tuning & ranking
Voice
Accent & verification
Contact Sales Team
RLHF Interface

Culturally Grounded Q&A

High-quality, expert-curated Q&A datasets in specific domains (Health, Finance) reflecting local realities for authentic fine-tuning.

  • Domain-Specific Expertise
  • Local Language Nuances
Q&A Dataset

Voice Datasets for ASR

Diverse voice data capturing varied accents, dialects, and real-world background noise for robust speech recognition evaluation.

  • Mobile-First Collection
  • Natural Environment Audio
Audio Collection
Our Expertise

Sectors Where
Nuance Matters

We specialize in high-stakes domains where "good enough" translation isn't safe. We ensure accuracy in critical fields like healthcare and finance.

Maternal & Public Health

Evaluating medical advice for local relevance, terminology, and safety guidelines. We verify that AI health assistants provide accurate, culturally safe information for expecting mothers and rural communities.

Inclusive Finance

Reviewing financial literacy content and support bots for unbanked populations. We adapt complex financial terms (interest rates, loans, mobile money) to local languages and mental models.

Cultural Alignment

Identifying western bias in history, social norms, and naming conventions. We align models with the lived realities, traditions, and sensitive historical contexts of African communities.

The Kitala Data Store

Download ethically sourced, expert-labeled datasets today.

afristereo-benchmark.jpg
FEATURED BENCHMARK

Afristereo Benchmark

A culturally grounded dataset for evaluating stereotypical bias in Large Language Models across Senegal, Kenya, and Nigeria. This repo includes the raw and cleaned Datasets, the complete pipeline (manual + automated) to generate the dataset, and the code used to perform the various LLM evaluations.

5k rows
JSON/CSV
CC-BY License
View Benchmark
Verified by Doctors

Health Q&A (Senegal)

Wolof & French medical Q&A pairs grounded in local practices.

View Details ->
SME Banking

Finance QA (Ghana)

Financial literacy content adapted for unbanked populations.

Coming soon ->
Noisy Env

Hausa ASR (Speech)

Diverse voice data capturing varied accents and backgrounds.

View Details ->
Evidence Based

Research & Benchmarks

Afristereo Leaderboard

Updated: Dec 2025

Evaluating Stereotypical Bias (BPR Score).
*BPR: Bias Probability Ratio. Higher scores indicate stronger stereotypical associations.

ModelBPR Score
Modern (2023-2024)
Llama 3.2 3B
0.78
Mistral 7B
0.75
Qwen 2.5 7B
0.71
Gemma 2 2B
0.71
Phi-3 Mini
0.70
Baseline (2019-2022)
GPT-Neo
0.71
GPT-2 Large
0.69
FinBERT
0.50
Explore full rankings
Access Script & Dataset

Our research partners

Recruiting Now

Join as a Research Participant

Are you a linguist, health expert, or community leader? Join our network of experts to evaluate AI models and ensure they reflect African reality.

Earn Reliable Income

Get paid fairly for every mission you complete. Earn competitive rates for your local expertise, paid directly to your mobile wallet.

Flexible Contribution

Work from anywhere using our mobile app. Whether you have 15 minutes or 2 hours, there are tasks available for you.

Protect Your Culture

Help stop AI hallucinations. Ensure your language and cultural norms are accurately represented in global technology.

Priority Recruitment Regions

Kenya Kenya
Ghana Ghana
Nigeria Nigeria
Senegal Senegal
Rwanda Rwanda
Participant
Participant
Participant
Participant
Participant
Participant
Participant
New Mission2m ago

Evaluate Medical Advice

Check if this advice aligns with local health guidelines in Nairobi.

$12.50
Payment Received
+ 15,000 CFA
Kitala /ki-ta-la/ • noun (Lingala)

A Reflection of Culture,
Identity & Diversity.

In Lingala, "Kitala" means reflection. We chose this name because we believe Artificial Intelligence in Africa must be a true reflection of the people it serves—grounded in our diverse cultures, languages, and realities.

We are building the infrastructure to ensure no community is left behind in the AI revolution.

Powered by YUX Design

The team behind Kitala is YUX Design, a leading social research and technology design company in Africa.

Team