Artificial intelligence now moderates billions of images per day, a scale impossible for human reviewers to match. But what these systems choose to flag reveals far more than technical capability. It exposes their blind spots, their training biases, and the assumptions they make about “safety.”
A new large-scale analysis conducted by Family Orbit processed 130,194 images commonly shared by teenagers on mobile devices. Using Amazon Rekognition Moderation Model 7.0, the study surfaced more than 18,103 flagged photos, allowing researchers to examine precisely what today’s AI models treat as risky or inappropriate.
The results point to a striking imbalance:
Sexual and suggestive content was flagged 7× more often than violence, self-harm, weapons, drugs, or hate symbols.
The core finding: AI moderators fixate on sexuality
Across all detections:
- 76% were classified under sexual, suggestive, swimwear, or nudity categories
- <10% involved violence
- <3% involved alcohol or tobacco
- Only 13 cases involved hate symbols
- 203 detections were simply the “middle finger” gesture
The model recognized over 90 unique moderation labels, but its strongest and most consistent responses were overwhelmingly tied to body exposure, not physical harm or dangerous behavior.
In other words:
A teenager in a bikini is far more likely to trigger an AI review than a teenager holding a weapon.
Inside the dataset: 130K+ photos, 18K flags
The researchers aggregated moderation labels into parent categories to compare the AI’s risk weighting.
High-frequency categories (Sexual/Suggestive)
- Suggestive – 852 detections
- Explicit Nudity – 711 detections
- Swimwear or Underwear – 528 detections
- Non-Explicit Nudity of Intimate Parts – 830 detections
Within these groups, labels like Revealing Clothes, Exposed Nipples, Partially Exposed Buttocks, and Graphic Nudity consistently reached high confidence scores (85–95%).
Low-frequency categories (Harm/Danger)
- Graphic Violence – 169 detections
- Weapon Violence – 64
- Blood & Gore – 116
- Self-Harm – 21
- Hate Symbols – 13
These numbers pale in comparison to the thousands of sexual-content detections.
Why the imbalance exists: The “bikini bias” in AI models
Content moderation models are trained on massive datasets sourced from a mix of public content, platform policies, and synthetic augmentation. Most major AI systems, including those from Amazon, Google, and Meta, are optimized to aggressively detect sexual cues because:
- Platforms face legal pressure around child safety and explicit content.
- Sexual content is easier to define visually than violence or harm.
- Training datasets overweight body-exposure categories, creating an inherited bias.
- Violence is often contextual, making it harder to detect reliably.
The result:
AI moderators over-police harmless images (like beach photos) and under-police dangerous ones (like weapons, bruises, or risky behavior).
The middle-finger problem: Gestures outrank dangerous behavior
One of the most unexpected findings was the frequency of gesture-related flags.
The AI flagged the “Middle Finger” gesture 203 times — more than:
- Hate symbols
- Weapons
- Self-harm
- Most drug-related categories combined
Gesture detection is highly prioritized, even though gestures pose almost zero safety risk.
This highlights a broader issue:
AI moderation tends to fixate on visual surface cues rather than underlying harm.
Why this matters for parents, platforms & policymakers
For Parents
You may assume AI moderation will highlight dangerous behavior (drugs, bruises, weapons).
Instead, it flags swimwear.
For platforms using automated moderation
These biases affect:
- Account suspensions
- Content removals
- Shadowbanning
- Teen safety alerts
- Automated reporting thresholds
Platforms often believe their systems are “neutral” — but data like this tells another story.
For policymakers and regulators
If AI systems disproportionately target non-dangerous content, this inflates risk metrics and obscures real harm.
Regulation that relies on moderation data are only as accurate as the models behind them.
Methodology summary
- Model used: AWS Rekognition Moderation Model 7.0
- Images analyzed: 130,194
- Flagged images: 18,103
- Confidence threshold: 60%+
- Unique labels identified: 90+
- Major parent categories analyzed: 15
- Data anonymization: All images were stripped of metadata; no personally identifying information was retained
A cleaned 500-row sample dataset is available for journalists and researchers.
Limitations
This study examines the behavior of one moderation model.
Other systems — such as Google’s Vision AI, TikTok’s proprietary moderation, or Meta’s internal classifiers — may prioritize different risk vectors.
Additionally:
- Cultural training bias is unavoidable
- Context is ignored
- Clothing ≠ harm
- Violence ≠ intent
- Gestures ≠ danger
AI moderation is still far from understanding nuance.
Takeaway: AI moderation still confuses exposure with risk
Family Orbit’s 2025 study makes one thing clear:
AI moderators treat “skin” as a higher-risk signal than “harm.”
As more digital platforms rely entirely on automated moderation, this mismatch becomes a real safety gap — not just a technical quirk.
To build safer digital environments, especially for young people, future AI moderation must evolve beyond surface-level detection and begin understanding context, behavior, and real indicators of danger.





