The problem with big data is that there is, well, so much of it. Analyzing it is like trying to sip from a firehose.
Just in time, a new book on the art and science of quantitative analysis arrives this week. In Keeping Up with the Quants: Your Guide to Understanding and Using Analytics, authors Thomas H. Davenport and Jinho Kim offer an introduction to the best methods for consuming data.
In this excerpt, Davenport and Kim discuss multiple ways to tell a story with data, "the best way to communicate results to nonanalytical people." The authors recommend six formats: The CSI Story, the Eureka Story, the Mad Scientist Story, the Survey Story, the Prediction Story, and the "Here's What Happened" Story. The excerpt focuses on the first two.
—Sean Silverthorne
book excerpt
Framing The Problem
From: Keeping Up with the Quants
THE CSI STORY
Some quantitative analyses are like police procedural television programs; they attempt to solve a business problem with quantitative analysis. Some operational problem crops up, and data are used to confirm the nature of the issue and find the solution. This situation often does not require deep statistical analysis, just good data and reporting approaches. It is often encountered in online businesses, where customer clickstreams provide plenty of data-often too much-for analysis.
One expert practitioner of the CSI story approach is Joe Megibow, vice president and general manager of online travel company Expedia's US business. Joe was previously a Web analytics maven-and he still is-but his data-based problem-solving approaches have led to a variety of impressive promotions.
Many of the Expedia investigations involve understanding the reasons behind lost online sales. One particular CSI story involved lost revenue on hotel payment transactions. Analysis of data suggested that after a customer had selected a hotel, filled in the travel and billing information, then clicked the "Buy Now" button, a percentage of the sales transactions were not being completed successfully.
Megibow's team investigated the reason for the failures, again using Web metrics data and server log files throughout the process. Apparently, the "Company" field under the customer's name was causing a problem. Some customers interpreted it as the name of the bank that supplied their credit card, and then they also supplied the bank's address in the billing address fields. This caused the transaction to fail with the credit card processor. Simply removing the "Company" field immediately raised profits for Expedia by $12 million. Megibow says that Expedia has explored many of these CSI-like stories, and they almost always yield substantial financial or operational benefits.
Sometimes the CSI stories do involve deeper quantitative and statistical analysis. One member of Megibow's team was investigating which customer touchpoints were driving online sales transactions. The analyst used the Cox regression model—an approach originally used to determine which patients would die and which would live over certain time periods-of "survival analysis." The analysis discovered that the simpler prior models were not at all correct about what marketing approaches were really leading to a sale. Megibow commented, "We didn't know we were leaving money on the table."
THE EUREKA STORY
The Eureka story is similar to the CSI story, except that it typically involves a purposeful approach to a particular problem (as opposed to stumbling over the problem) to examine a major change in an organization's strategy or business model. It tends to be a longer story with a greater degree of analysis over time. Sometimes Eureka stories also involve other analytical story types, just because the results are so important to the organizations pursuing them.
At Expedia again, for example, one Eureka story involved eliminating change/cancel fees from online hotel, cruise, and car rental reservations. Until 2009, Expedia and its competitors all charged up to $30 for a change or cancellation—above and beyond the penalties the hotel imposed. Expedia and other online bookers' rates were typically much lower than booking directly with a hotel, and customers were willing to tolerate change/cancel fees.
However, by 2009 it had become apparent that the fees had become a liability. Expedia's rates were closer to those of the hotels' own rates, so the primary appeal of Expedia had become convenience—and change/cancel fees were not convenient. Analysts looked at customer satisfaction rates, and they were particularly low for customers who had to pay the fees. Expedia's call center representatives were authorized to waive the change/cancel fees for only one reason: a death in the customer's family. A look at the number of waivers showed double-digit growth for the past three years. Either there was a death epidemic, or customers had figured out they could get their money back this way.
Expedia executives realized the market had changed, but change/ cancel fees represented a substantial source of revenue. They wondered if the fees were eliminated, would conversion (completed sale) rates go up? In April of 2009, they announced a temporary waiver of fees for the month (a bit of a mad scientist testing story, described below). Conversion rates immediately rose substantially. Executives felt that they had enough evidence to discontinue the fees, and the rest of the industry followed suit.
Across town in Seattle lies Zillow, a company that distributes information about residential real estate. Zillow is perhaps best known to quant jocks for its "Zestimates," a proprietary algorithm that generates estimates of home values. But, like Expedia, Zillow's entire culture is based on data and analysis—not surprisingly, since the company was founded by Rich Barton, who also founded Expedia.
One of Zillow's Eureka stories involved a big decision to change how it made its money from relationships with real estate agents. Zillow began to work with agents in 2008, having previously been focused on consumers. One aspect of its agent-related business model was selling advertising by agents and delivering leads to them. Zillow charged the agents for the leads, but the value per lead was not enough in the view of executives. Chloe Harford, a Zillow executive who heads product management and strategy, was particularly focused on figuring out the right model for increasing lead value and optimizing the pricing of leads.
Harford, who has a PhD in volcanology, or the study of volcanoes, is capable of some pretty sophisticated mathematical analysis. However, she and her colleagues initially relied on what she calls "napkin math" to explore other ways to generate more leads and price them fairly to agents. In April 2010, Zillow created a new feature—immediately copied by competitors—involving selling advertising to agents. It created many more customer contacts than before, and allowed the consumer to contact the agent directly. Zillow also introduced a sophisticated algorithm for pricing leads to agents that attempts to calculate the economic value of the lead, with an estimate of conversion rates. Competitors also do this to some degree, but probably not to the level of sophistication that Zillow does. The leads and pricing of them are so important that Harford and her colleagues frequently test different approaches of them with some of the Mad Scientist testing approaches described below. In short, Zillow's Eureka stories are intimately tied into its business model and its business success.