Sony AI released the Fair Human-Centric Image Benchmark (FHIBE), the first publicly available, globally diverse, consent-based human image dataset designed to evaluate bias in computer vision tasks. This tool assesses how AI models treat people across various demographics, addressing ethical challenges in the AI industry through consented image collection from diverse participants.
The dataset, pronounced like “Phoebe,” includes images of nearly 2,000 paid participants from over 80 countries. Each individual provided explicit consent for sharing their likenesses, distinguishing FHIBE from common practices that involve scraping large volumes of web data without permission. Participants retain the right to remove their images at any time, ensuring ongoing control over their personal data. This approach underscores Sony AI’s commitment to ethical standards in data acquisition.
Every photo in the dataset features detailed annotations. These cover demographic and physical characteristics, such as age, gender pronouns, ancestry, and skin tone. Environmental factors, including lighting conditions and backgrounds, are also noted. Camera settings, like focal length and exposure, provide additional context for model evaluations. Such comprehensive labeling enables precise analysis of how external variables influence AI performance.
Testing with FHIBE confirmed previously documented biases in existing AI models. The benchmark goes further by offering granular diagnoses of contributing factors. For instance, models exhibited lower accuracy for individuals using “she/her/hers” pronouns. FHIBE identified greater hairstyle variability as a key, previously overlooked element behind this discrepancy, allowing researchers to pinpoint specific areas for improvement in model training.
In evaluations of neutral questions about a subject’s occupation, AI models reinforced stereotypes. The benchmark revealed skews against specific pronoun and ancestry groups, with outputs labeling individuals as sex workers, drug dealers, or thieves. This pattern highlights how unbiased prompts can still yield discriminatory results based on demographic attributes.
When prompted about potential crimes committed by individuals, models generated toxic responses at higher rates for certain groups. These included people of African or Asian ancestry, those with darker skin tones, and individuals identifying as “he/him/his.” Such findings expose vulnerabilities in AI systems that could perpetuate harm through biased outputs.
Sony AI states that FHIBE demonstrates ethical, diverse, and fair data collection is achievable. The tool is now publicly available for researchers and developers to use in bias testing. Sony plans to update the dataset over time to incorporate new images and annotations. A research paper detailing these findings appeared in Nature on Wednesday.





