This would be terrible for marginalized groups and scientific research.
PLUS: My favorite essays and papers from the week.
Hi, it’s Charley, and this is Untangled, a newsletter about technology, people, and power.
👇 ICYMI
I invited you to co-create a new essay on how generative AI might alter grieving and loss.
I freed an old essay from behind the paywall - “The Doom Loop of Synthetic Data” — to help you untangle the news that OpenAI intends to use synthetic data to train future models.
I published a special issue detailing how we - yes, you! — co-construct training datasets for AI.
🔗 Some Links
Here are my favorite essays and papers from the week:
A short and smart read on why “Technological risks are not the end of the world.”
A new paper explains how AI standards are politically constructed.
A long-form essay applies an ecological approach to ‘rewild the internet.’
An incredible series of papers on the ideologies that animate AI and the consolidation of power in the sector.
One of my favorite newsletters on AI agents and why “the trajectory of any new technology bends toward money.”
Generative AI can’t represent identity groups
There’s a debate raging over whether large language models (LLMs) can replace human participants in research studies. I overviewed why this would be a big problem in “Illusions of Understanding” but here’s another reason: LLMs can’t represent identity groups. Let’s dig in.
A new research paper by Angelina Wang, Jamie Morgenstern, and John Dickerson finds that “LLMs are doomed to misportray and flatten the representations of demographic groups.” ‘Doomed’ might sound like strong language, but the authors trace the outcomes to limitations inherent to the technology. The first limitation? Every model is trained on scraped web data, and it’s rarely the case that the author’s demographic identity is associated with the scraped text. Sure, there are exceptions (e.g. an autobiographical text where authors reference their own identity) but the researchers find that when asked to portray specific demographic groups, the LLMs produced more “out-group imitations than in-group representations.” In other words, the responses to prompts are more likely to reflect a White person speaking about a Black person (i.e. out-group imitation) than a Black person talking about themselves or their community (i.e. in-group representations). That will undoubtedly perpetuate stereotypes, and, as the authors point out, continue the “practice of speaking for others” that “can often involve the erasure and reinscription of social hierarchies.”
Keep reading with a 7-day free trial
Subscribe to Untangled with Charley Johnson to keep reading this post and get 7 days of free access to the full post archives.