🤖 Will ChatGPT replace search?

How the limits of chatbots can point us toward a better search system.

Feb 05, 2023

Hi there, and welcome back to Untangled, a newsletter and podcast about technology, people, and power. This is the free issue of Untangled, which arrives in your inbox once a month. If you want more sweet, sweet content that helps you — as product managers, technologists, grant-makers, civil society leaders, and concerned citizens alike — analyze the big sociotechnical problems of the day, make better decisions at work, and take daily actions that align with your values, sign up for the paid edition.

Upgrade to paid!

Before we get into it, two pieces of exciting news:

Nick Martin, CEO of TechChange, named Untangled his favorite newsletter.
Untangled just eclipsed 2.2K subscribers. That’s a lot more than the 200 subscribers I had just one year ago. That’s really exciting - thanks for being part of this!

Now, on to the show.

I’m a real curious person. I enjoy seeking out information, analyzing it, and synthesizing it. Search feels like a fundamental human pursuit to me. Naturally, then, I don’t get the appeal of using ChatGPT as a search system. So when Microsoft recently announced that they are working to integrate ChatGPT into its Bing search engine, a lil’ part of me died inside. I kid, but the point is, I don’t want a direct answer to my query, I want to search for it.

But also, I get the desire for an alternative to Google. It’s a private company that monetizes our clicks without our consent and has a market cap that is 60% of the total GDP of all of Sub-Saharan Africa. Google’s search system is dedicated to its own financial interests, and the experience of using it is getting worse and worse in part because of its business model. Not great. What even constitutes a good search system? In this essay, I start to answer this question by evaluating the limitations of ChatGPT and Google. Let’s dig in.

There was a time when ‘Search’ meant having a conversation with a librarian, which veered in various directions and then, eventually, pointed you to a book or two. With the rise of the web, ‘search’ morphed into a process whereby we input a query and get myriad web pages to choose from. Social media has led to more social searching: sites like Quora and Reddit turn the query and response process into a community Q&A of sorts.

The point is, how we understand search is intertwined — some might say ‘entangled’ — with the technologies of the day and the social behaviors they encourage. One big issue with chatbots like ChatGPT is that it encourages the acceptance of a direct answer. Whereas Google might offer a mix of high-quality and clickbaity information sources from which to choose, ChatGPT generates a single plausible response with an air of authority. As I wrote before:

“What’s striking to me about all of the ChatGPT3 examples going around the internet isn’t that many are wrong, but how confidently wrong they are. The answers sound authoritative, even if the output is only ever probabilistic. In short, with ChatGPT3, entertainment and confidence substitute for truth and meaning. If that’s not Trumpy, I don’t know what is.”

Indeed, the authoritativeness endowed in a direct answer poses an unavoidable dilemma. In ‘The Dilemma of the Direct Answer’ Martin Potthast, Matthias Hagen, and Benno Stein explain that “The dilemma of the direct answer is a user’s choice between convenience and diligence when using an information retrieval system.” I don’t know about you, but I’m concerned that ‘more convenience’ wins in a fight against ‘more research’ most of the time.

This is a problem because the answers to most questions and queries aren’t obvious. If you were talking to a librarian, you might go back and forth. As they probe, you might have an “uh, I don’t know” moment or think “it totally depends” or get a lil’ testy and say “hold up, I disagree with the presupposition of that question.” It takes work and iteration to cut through the ambiguity implicit in a given query. But chatbots just remove that uncertainty and equivocation altogether, leaving no trace that it was ever there in the first place. As the authors put it, “We don’t know what a good answer is because the world is complex, but we stop thinking that when we see these direct answers.”

This directness is also at odds with flexibility and exploration. Search systems need to support a diverse range of search activities because we’re complex lil’ humans with lots of different needs! In ‘Situating Search,’ University of Washington professors Emily Bender and Chirag Shah put it this way,

“Information sources as well as people’s information seeking behavior have become more diverse, which in turn increases the need for flexible tools that can support diverse modes of usage.”

Gary Marchionini, professor at the University of North Carolina at Chapel Hill offers a nice framework to simplify what amount to twenty different “information searching strategies:”

Lookup: search as a fact-finding mission. You know what you’re looking for, and your query yields precise results.
Learn: search as knowledge acquisition. Your search involves multiple iterations, followed by interpretation and analysis. You’re trying to learn or interpret something!
Investigate: search as analysis and evaluation. You’re searching with a specific intention — e.g. to support “planning, forecasting, or to transform existing data into new data or knowledge,” as Marchionini writes. He also includes “serendipitous browsing that is done to stimulate analogical thinking” in this category.

Chatbots like ChatGPT are useful tools for ‘lookup’ queries. They can help you find a known item. But they struggle to support search strategies in the latter two buckets. Bender and Shah use the example of someone concerned about being evicted from their home. They imagine a user who enters the query ‘Who can help me avoid being evicted?’ The user isn’t on a fact-finding mission, they want to explore different resources available to people at risk of losing their homes. But in an attempt to offer a direct answer, ChatGPT limits the process of sense-making. As Bender and Shah write, in this scenario, a chatbot “prevents the user from being able to build their own model of the space of possibilities available.”

Or take another example from Bender and Shah: “What is the number of a 24-hour advice nurse?” As they explain, a human might know important background context, like the fact that the service one can call depends on the user’s healthcare provider and/or insurance plan, which in turn depends on where they live. The user might be able to find this relevant information in the meta-data of a typical set of search results, but the chatbot? Here’s Bender and Shah:

“A language-model-based dialogue agent, on the other hand, would likely synthesize a string with a phone-number shaped string of digits (possibly not even an actual phone number from a relevant source document) and might link to one or another of the web pages with text about advice nurses (not necessarily the same one with the phone number), but is unlikely to know to foreground the information about which patients the number is available to, nor to provide multiple options differentiated by healthcare provider/insurance plan.”

For a number of search types, context is king, but with chatbots, that context is often erased as it synthesizes strings of text.

Last, we know Google and search engines like it have exacerbated social biases. In Algorithms of Oppression, Safiya Noble reminds us that search results aren’t a reflection of their truth or popularity, rather “search results reflect the values and norms of the search company’s commercial partners and advertisers and often reflect our lowest and most demeaning beliefs.” Indeed, Noble goes on to show that search algorithms “privilege whiteness and discriminate against people of color, specifically women of color.” But ChatGPT might make this even worse. With Google or Bing, the searcher sees racist or sexist results next to others; so they’re nudged to ask, ‘um, where do these come from?’ Now imagine the racist result coming from an authoritative seeming voice that we anthropomorphize as ‘intelligent.’ As Bender and Shah write,

“Where are the toe-holds that would allow a user to start to understand where the results are coming from, what biases the source data might contain, how those data were collected, and how modelling decisions might have amplified biases?”

This isn’t a hypothetical concern. Microsoft once released a chatbot called Tay that was then quickly removed from the internet for responding to prompts with racist, xenophobic, and otherwise hateful language. Just months ago, Meta stopped developing a chatbot for similar reasons. This is, at least in part, why Google hasn’t yet released its competitor into the wild.

📉 The other reason Google might be hesitant to double down on chat bots? It’s business model depends on you clicking ads next to its search results. Amr Awadallah, who worked for Yahoo and Google and now runs Vectara put it this way: “Google has a business model issue. If Google gives you the perfect answer to each query, you won’t click on any ads.” This is a spot-on example of the innovator’s dilemma.

OpenAI is trying to solve this problem on the cheap — they outsourced the job of labeling violent, sexist, and racist text to Kenyan laborers — revealing yet another way in which emerging technologies are entangled with social systems: often hidden global labor!

So what to do about the future of search?

The very first thing OpenAI could do is be more transparent — it could share the corpus of data its using to train ChatGPT. But loyal Untangled readers will know that’s not enough. Applying the framework of transparency scholar Jonathan Fox, we might also demand information about OpenAI and how internal decisions and practices shaped the development of ChatGPT. As I’ve written before, “algorithms are shaped by people, practices, and beliefs” so therefore “algorithmic assessments should be ongoing, interrogate the decisions and assumptions that led to its design, and be able to demand changes to that design.” Another way to minimize harm might be annotating responses with a disclaimer. Potthast, Hagen, and Stein proposed systems like ChatGPT include the statement ‘This answer is not necessarily true. It just fits well to your question.’

However, we should really take this as an opportunity to go beyond simply minimizing harm, and begin imagining alternative search systems. The limits of Google and ChatGPT — of which there are many — should offer strong starting points: we need systems that support exploration and our varied search intentions; systems that preserve context rather than flattening it, and systems that are transparent enough that we can understand how it produced the results that it did.

But above all, we need systems that are public; for which search results aren’t unduly influenced by financial incentives. At the moment, we seem to be stuck in a place where we prioritize convenience over the goals of learning, exploration, truth, and equity.

What ChatGPT has done for us is made it clear that search systems can indeed be different, which is why it’s vitally important that we don’t go down the same road as before; now is the time when we can strive for search systems that work for the public as a whole; ones that deliver the highest quality of information, to the most people.

Untangled with Charley Johnson

Discussion about this post