Whether it’s booking a hotel, renting a movie, or buying a car, many of us consult multiple reviews before deciding. It’s called aggregating opinions, and we do it without even thinking about it.
Crowdsourcing works so well, in fact, says Harvard Business School visiting associate professor Yael Grushka-Cockayne, that executives should adopt a similar approach when it comes to using probability forecasts of business-critical issues; for example, the likelihood that product demand will increase by a given percentage next quarter.
“The whole notion of using crowds is very popular in many different fields,” says Grushka-Cockayne, whose research is on data science, forecasting, project management, and behavioral decision-making. “Our work is focused on using crowds for prediction and for forecasting something that is unknown.”
The idea is explored in a working paper published in October 2018, Averaging Probability Forecasts: Back to the Future. Grushka-Cockayne, on loan from University of Virginia’s Darden School of Business, joined with Darden colleague Casey. Lichtendahl; Bob Winkler of Duke University’s Fuqua School of Business; and Victor Jose of Georgetown University’s McDonough School of Business to lay out best practices as well as challenges. They also highlight three domains already using probability forecasts successfully: meteorology, economics, and political science.
Probability forecasting differs from simple point forecasts by producing a range of possible outcomes and how likely each outcome is, conveying richer information related to business decisions. Its use is on the rise, but businesses are still learning how best to leverage it.
Examples of this technique are everywhere. When meteorologists track a hurricane path, the “cones of uncertainty” they refer to are probability forecasts, outlining the likelihood of the storm going in one direction versus another. In economics, the Federal Reserve Bank of Philadelphia coordinates probability and point forecasts for predicted growth in gross domestic product, unemployment rate, and inflation rates. Political science has seen a rise in probability forecasts for geopolitical events, with forecasting competitions and blogs like Nate Silver’s fivethirtyeight.com offering probabilities of election results, outcomes of sporting events, the Oscars, and other topics.
“Our work is focused on using crowds for prediction and for forecasting something that is unknown.”
The rise of big data and machine learning offers infinitely more fuel to churn out probability forecasts, which can serve as an entry point for businesses looking to harness their data to make better decisions.
“Predictions might be coming from individuals, and they might be coming from models or machines,” Grushka-Cockayne says. “They might rely on a lot of actual data, or they might be subjective in nature, and it doesn’t matter. The idea is that we want to follow certain principles that we believe are useful.”
Estimates should come from diverse sources
Chief among the recommendations from Grushka-Cockayne and her colleagues is aggregating several probability forecasts, typically from between five and 10 experts. Ideally, those experts should have some diversity, meaning some that might predict a narrower range while others are broader. When Grushka-Cockayne refers to averaging the forecasts, she is not necessarily referring to a mathematical average but any number of ways to combine them. Various methods can include weighting certain ones or trimming results to get a more precise range.
Common mistakes include “overfitting” a statistical model, ensuring the model hits every historical data point and thus making it overly specific and lacking room for future variables. Another problem can develop with miscalibration, failing to consider whether individual forecasts may be overconfident, which in this case means too narrow, or underconfident. When combining probability forecasts, users need to adjust for those individual tendencies.
Measuring accuracy and tracking performance are crucial to improving forecasts. Just as there are different methods to combine forecasts, there are many scoring rules to choose from when measuring accuracy. Scoring rules can nudge forecasters in one direction by giving a penalty for overestimating or underestimating. Whatever rule is accepted should align with overall goals.
In the hurricane example, forecasters need to decide if it’s better to be narrower due to the high cost of evacuations or be wider to ensure every possible scenario is taken into account. In business, that could mean weighing the cost of not having enough stock to meet demand versus carrying excess stock.
“That’s a tradeoff, and the context of the forecast determines that tradeoff because it is linked to the specifics of the downsides of getting it wrong,” Grushka-Cockayne says.
Use history for confirmation
Organizations serious about improving their forecasting abilities need to do better at tracking past results. While it’s easy to find sources that issue predictions, it’s much harder to find ones that track their predictions and measure how good they were.
“The only way that we get better is by tracking,” Grushka-Cockayne says. She points to weather forecasters, often maligned but pretty reliable. “Those guys see realizations every single day, and so do we, so we all hold them accountable. That’s the best test for them, and that’s why they’re so good because people care, people track it, people hold them accountable, and people measure how good they were. That’s what we should be doing with our firms.”
Probabilities are hard for people to understand, Grushka-Cockayne says. She credits the work of the National Hurricane Center, Nate Silver, and others for helping educate the public on how to interpret probability forecasts, but more work is needed to improve visualizations.
“Even if you say there’s a 10 percent chance that, say, Trump will win, it’s still a chance,” Grushka-Cockayne says. “It’s still a numbers game. It’s a challenging thing to convey, so practicing with good visualizations and conveying the visualizations is something we think is key.”
In practice at Heathrow
Grushka-Cockayne has another paper out, Forecasting Airport Transfer Passenger Flow Using Real-Time Data and Machine Learning, that puts many of these principles into practice.
That paper, coauthored with Xiaojia Guo and Bert De Reyck of the University College London School of Management, details a project with Heathrow Airport using probabilities to forecast flows of international connecting passengers.
The system allows airport and airline officials to address likely bottlenecks before the security or immigration queues back up, and identify passengers at risk of missing connecting flights. Using real-time data and easily understood visuals, the system produces a range of possibilities, not a point forecast, and its accuracy is measured.
“Why we care about a better prediction is to make better and more informed decisions.”
The system is operating at Heathrow, with Paris airports in advanced discussions about implementing it there.
“There’s much appreciation in the domain for improved forecasting because they feel there is a lot of data that they own [but] they're underutilizing what they get,” Grushka-Cockayne says. “Players in the air traffic space have to deal with typical challenges that firms have to deal with today; that having many different stakeholders implies different systems, different data collection habits, different quality.”
The first step is cleaning up the data so the different systems can be leveraged together. She pointed to companies like Anheuser-Busch, which frequently acquire smaller breweries, being able to use the same technique. The medical field is another domain with a lot to gain from being able to predict patient flow better.
“At the end of the day it’s all about decision-making,” Grushka-Cockayne says. “Why we care about a better prediction is to make better and more informed decisions.”
[Image courtesy Goddard Space Flight Center]
Related Reading:
The Entrepreneurs Who Invented Economic Forecasting
Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity at Scale
RESEARCH PAPER: Bias in Official Fiscal Forecasts: Can Private Forecasts Help?
What do you think of this research?
How sharp are your organization's forecasting abilities? How could they be better? Share your insights below.