Companies have long poured time and money into surveying customers. Now, with new research showing artificial intelligence provides plenty of rich data about shopper preferences, could customer surveys become obsolete?
Companies turn to people for honest feedback about what they will and won’t buy, but large language models like generative pre-trained transformers (GPTs) may allow companies to rely on AI to uncover consumers’ tastes, according to new research from Harvard Business School and Microsoft. Ayelet Israeli, an associate professor at HBS, and her fellow researchers queried a commercially available version of GPT-3 to elicit thousands of simulated customer responses and found that AI can produce demand patterns that resemble those of human studies.
“Utilizing this tool, which is in some ways a consumer simulator, actually gives you useful and meaningful information, as if it came from a sample of customers.”
While the recent emergence of ChatGPT has reignited fears that machines may replace humans in the workplace, the results of this study don’t necessarily mean that AI is going to gut marketing departments, the researchers say. Instead, the findings show the potential value of AI as an important tool for increasing productivity, reducing costs, and improving the quality of survey designs and insights generated within the fast-growing, $80 billion market research industry.
“We’re not saying everyone should now use this instead of talking to consumers, but we are saying that utilizing this tool, which is in some ways a consumer simulator, actually gives you useful and meaningful information, as if it came from a sample of customers,” says Israeli, the Marvin Bower Associate Professor at HBS.
Companies all over the world routinely spend heavily on time-consuming market research in hopes of uncovering new insights about their target customers. But, even as market research tools have rapidly evolved, the results of such studies still offer only a snapshot of customer sentiment, and survey data is often flawed, the research team says.
“Humans tend to tell you they would pay more than they’re actually willing to pay. They say they would choose something that they don’t actually choose in practice,” says James Brand, an economist for Microsoft, who cowrote the working paper with Israeli and Donald Ngwe, a former HBS faculty member who is now an economist at Microsoft.
How well did AI do?
The researchers’ first step was to determine whether market research results elicited from GPT were consistent with expectations, based on established economic theory. To do this, they set the large language model to provide responses with the highest-possible rate of randomness. They then crafted prompts—the questions users ask an AI tool—about specific products like toothpaste and laptops, seeking hundreds of responses about whether the “customer” would choose to purchase products at various price points.
“This allows us, for each price, to figure out the mean and the distribution around that, and then look at the overall shape of what we get, and determine whether we are actually getting something that looks like a realistic demand curve or not,” Israeli explains.
When the GPT prompt included information about the simulated customer’s income, varying between $50,000 and $120,000 per year, the responses indicated that higher income was correlated with higher price tolerance. This was in keeping with the pattern that researchers expected to see based on past research on the relationship between customers’ income and their willingness to pay.
“That was pretty incredible to us, that you’re able to identify these patterns even with this simulated data.”
The team then introduced two brands of toothpaste, Crest and Colgate, and set Colgate as the preferred brand. By altering the price of Colgate, they could see at what point “customers,” on average, would switch to the less preferred but cheaper brand.
“Substitution patterns that you expect to find in observational data, we were able to find by collecting GPT’s responses,” says Israeli. “That was pretty incredible to us, that you’re able to identify these patterns even with this simulated data.”
The researchers also found that telling GPT that it had purchased a product before, such as yogurt, and how much of the product the “customer” already had at home, affected purchasing decisions in predictable ways: the more yogurt they had at home, the lower the price they were willing to pay for one additional unit, but not to the magnitude the researchers expected. Likewise, when asked to behave like a “random restaurant-goer” who had already consumed a few glasses of wine, GPT was still willing, on average, to pay the same price for subsequent glasses. This is contrary to theoretical predictions that would suggest that the more of a good someone consumes, the less they would be willing to pay for an additional unit of that good.
“In this case, prompting that a customer has consumed wine may not only tell GPT about the customer’s prior consumption but also that the customer really likes wine,” Ngwe explains.
Might GPT also consider shifts in the decision-making ability of a restaurant-goer who had consumed a few glasses of wine? Maybe. The black-box nature of AI makes it impossible to know exactly what factors are used to generate responses, the researchers say.
Comparing AI results with customer surveys
In the second part of the study, the research team compared GPT results with a recent study involving actual people to assess the value customers assigned to specific product attributes.
For example, a recent study of human consumers found that shoppers were willing to pay $3.27 for fluoride in their toothpaste, and the GPT study results were “quite similar,” with one estimate coming in at $3.40, according to the working paper.
Consumer studies like these generally cost upwards of $20,000 and take researchers between three and six months to complete, says Brand. Whereas, with AI, “we can get those answers in under 15 minutes,” he says.
Using AI to run this type of analysis prior to embarking on a human study could dramatically increase both the efficiency of testing and the quality of the results, adds Israeli.
“Because I'm not restricted by attributes or human time or human understanding of complexity, I can identify the things that GPT suggests actually matter, and then iterate on those with real data and real consumers to get human-based results,” she says.
Developing the right prompt
Products included in the experiments were commonly purchased items like toothpaste, yogurt, and wine as well as laptops. More research is needed to understand how GPT might provide useful insight for market researchers working with unique or multifaceted products or services, since language models may require more training and context to produce accurate results.
Additional study on the phrasing or engineering of AI prompts could also help improve results, says Brand, who along with Ngwe is exploring uses for GPT in market research for business units within Microsoft. The researchers included the prompts used in the experiments in the appendix of the working paper, and Israeli says the team has already received positive feedback from market researchers in the field who are beginning to explore the use to some of their methods.
“There’s a really special type of skill that comes up when you think about interacting with large language models, which is prompt engineering,” Brand says. “I think there’s a lot that future researchers could do to iterate and get us a little bit closer to real-world studies.”
You Might Also Like:
- Is AI Coming for Your Job?
- Why Confronting Racism in AI 'Creates a Better Future for All of Us'
- When Bias Creeps into AI, Managers Can Stop It by Asking the Right Questions
Feedback or ideas to share? Email the Working Knowledge team at hbswk@hbs.edu.
Image: iStockphoto/Fascinadora