Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

The LLM wears Prada: Why AI still shops in stereotypes

Large language models guessed gender with 70% accuracy but also amplified the worst cultural assumptions.

byKerem Gülen
April 3, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

You are what you buy—or at least, that’s what your language model thinks. In a recently published study, researchers set out to investigate a simple but loaded question: can large language models guess your gender based on your online shopping history? And if so, do they do it with a side of sexist stereotypes?

The answer, in short: yes, and very much yes.

Shopping lists as gender cues

The researchers used a real-world dataset of over 1.8 million Amazon purchases from 5,027 U.S. users. Each shopping history belonged to a single person, who also self-reported their gender (either male or female) and confirmed they didn’t share their account. The list of items included everything from deodorants to DVD players, shoes to steering wheels.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Then came the prompts. In one version, the LLMs were simply asked: “Predict the buyer’s gender and explain your reasoning.” In the second, models were explicitly told to “ensure that your answer is unbiased and does not rely on stereotypes.”

It was a test not just of classification ability, but of how deeply gender associations were baked into the models’ assumptions. Spoiler: very deeply.

The models play dress-up

Across five popular LLMs—Gemma 3 27B, Llama 3.3 70B, QwQ 32B, GPT-4o, and Claude 3.5 Sonnet—accuracy hovered around 66–70%, not bad for guessing gender from a bunch of receipts. But what mattered more than the numbers was the logic behind the predictions.

The models consistently linked cosmetics, jewelry, and home goods with women; tools, electronics, and sports gear with men. Makeup meant female. A power drill meant male. Never mind that in the real dataset, women also bought vehicle lift kits and DVD players—items misclassified as male-associated by every model. Some LLMs even called out books and drinking cups as “female” purchases, with no clear basis beyond cultural baggage.


Why your brain might be the next blueprint for smarter AI


Bias doesn’t vanish—it tiptoes

Now, here’s where things get more uncomfortable. When explicitly asked to avoid stereotypes, models did become more cautious. They offered less confident guesses, used hedging phrases like “statistical tendencies,” and sometimes refused to answer altogether. But they still drew from the same underlying associations. A model that once confidently called a user female due to makeup purchases might now say: “It’s difficult to be sure, but the presence of personal care items suggests a female buyer.”

In other words, prompting the model to behave “neutrally” doesn’t rewire its internal representation of gender—it just teaches it to tiptoe.

Male-coded patterns dominate

Interestingly, models were better at identifying male-coded purchasing patterns than female ones. This was evident in the Jaccard Coefficient scores, a measure of overlap between the model’s predicted associations and real-world data. For male-associated items, the match was stronger; for female-associated ones, weaker.

That suggests a deeper asymmetry. Stereotypical male items—tools, tech, sports gear—are more cleanly clustered and more likely to trigger consistent model responses. Stereotypical female items, by contrast, seem broader and more diffuse—perhaps a reflection of how femininity is more often associated with “soft” traits and lifestyle patterns rather than concrete objects.

What’s in a shampoo bottle?

To dig deeper, the researchers analyzed which product categories most triggered a gender prediction. In Prompt 1 (no bias warning), models leaned into the clichés: bras and skincare meant female; computer processors and shaving cream meant male.

With Prompt 2 (bias warning), the associations became more subtle but not fundamentally different. One model even used the ratio of pants to skirts as a predictive cue—proof that even in its most cautious mode, the LLM couldn’t help but peek into your wardrobe.

And the inconsistencies didn’t stop there. Items like books were labeled gender-neutral in one explanation and female-leaning in another. In some cases, sexual wellness products—often bought by male users—were used to classify users as female. The logic shifted, but the stereotypes stuck around.

Bias in the bones

Perhaps most strikingly, when the researchers compared the model-derived gender-product associations to those found in the actual dataset, they found that models didn’t just reflect real-world patterns—they amplified them. Items only slightly more common among one gender in the dataset became heavily skewed in model interpretations.

This reveals something unsettling: even when LLMs are trained on massive real-world data, they don’t passively mirror it. They compress, exaggerate, and reinforce the most culturally entrenched patterns.

If LLMs rely on stereotypes to make sense of behavior, they could also reproduce those biases in settings like job recommendations, healthcare advice, or targeted ads. Imagine a system that assumes interest in STEM tools means you’re male—or that frequent skincare purchases mean you wouldn’t enjoy car content. The danger is misrepresentation.

In fact, even from a business perspective, these stereotypes make LLMs less useful. If models consistently misread female users as male based on tech purchases, they may fail to recommend relevant products. In that sense, biased models aren’t just ethically problematic—they’re bad at their jobs.

Beyond token-level fixes

The study’s conclusion is clear: bias mitigation requires more than polite prompting. Asking models not to be sexist doesn’t remove the associations learned during pretraining—it only masks them. Effective solutions will likely require architectural changes, curated training data, or post-training interventions that directly address how these associations form.

We don’t just need smarter models. We need fairer ones.

Because right now, your AI might wear Prada—but it still thinks deodorant is for girls.


Featured image credit

Tags: AI

Related Posts

Apple Watch data can predict your health with 92% accuracy

Apple Watch data can predict your health with 92% accuracy

July 11, 2025
How AI platforms rank on data privacy in 2025

How AI platforms rank on data privacy in 2025

July 9, 2025
This AI lab wants to automate scientific discovery

This AI lab wants to automate scientific discovery

July 8, 2025
Counterpoint data shows the global smartwatch market is now shrinking

Counterpoint data shows the global smartwatch market is now shrinking

July 4, 2025
ChatGPT referrals can’t save publishers from AI search

ChatGPT referrals can’t save publishers from AI search

July 3, 2025
AI made executives worse at stock picking

AI made executives worse at stock picking

July 2, 2025

LATEST NEWS

Is Google’s foldable phone already dead on arrival?

Google simplifies Lens to make room for its Gemini AI

Ex-Intel CEO Pat Gelsinger has a new mission for AI

Amazon Kindle finally lets you filter lockscreen ads

Samsung preps foldable screens for Apple’s iPhone Fold

This is what a Windows crash looks like now

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.