Behind IBM's High-Risk Decision to Put Watson on 'Jeopardy'

The gamble looked like an epic fail when supercomputer Watson fumbled the first question. But failure didn't last long. Willy Shih describes IBM's marketing of Watson in a recent case study.

Subscribe on iTunes  Follow on Libsyn

Subscribe on iTunes  Follow on Libsyn

Podcast Transcript

Brian Kenny: And now for the daily double. This contestant holds the highest two-day score ever recorded on the quiz show “Jeopardy!” The answer is Watson. Oh, and by the way, he’s not human. Watson is a deep question answering machine created by IBM who was introduced to the world in a nationally televised “Jeopardy!” exhibition, which he won handily. But he wasn’t always so smart. Today we’ll hear from Professor Willy Shih about his case study Building Watson: Not So Elementary, My Dear. I’m your host Brian Kenny, and you’re listening to Cold Call.

Professor Shih, an expert in manufacturing and product development, has written numerous articles, cases, and papers on the topic. He also spent 28 years in industry at some of the largest technology firms in the world, including of course IBM, which brings us to the case at hand. Willy, thanks for joining us.

Willy Shih: My pleasure.

Kenny: Can you start by telling us how this case opens? What’s the opening scenario here?

Shih: Well, the opening of the case is Dave Ferrucci, who is the protagonist in the case, sitting in the audience at that “Jeopardy!” contest where Watson was pitted against the all-time human champions in the game. So, the case opens with a final “Jeopardy!” question for Game 1 in the category U.S. Cities. The question was, “Its largest airport is named for a World War II hero; the second largest for a World War II battle.” Now, Dave Ferrucci is in the audience watching his machine answer, “What is Toronto?” which of course is the wrong answer. And Ferrucci looks to Eric Brown, who’s another one of the protagonists in the case, and he immediately realizes he’s going to have a lot of questions that he has to answer to management as to how the machine could come up with such an obviously wrong answer in this high-stakes contest.

Kenny: This is an unusual kind of case for HBS. There’s a lot of technology elements to it, and I’m wondering what prompted you to write it? What interested you in it?

Shih: Well, the case is fundamentally about product development, and how do you do product development in some of these very technology-intensive areas where you have very ambitious goals. What prompted us to write the case was, looking at the choice of “Jeopardy!” as a target for this design—was it a good choice or was it a bad choice? And what might that tell us about complex product development processes?

Kenny: You spent a lot of time at IBM—I’d like to know how does an idea like Watson ever get started there? And what’s in it for them?

Shih: Well, I think that’s one of the real questions in the case, because on the one hand, you look at Watson, and you say, “Well, is this really a directly commercializable product?” You know, why would somebody in the research division pick something like this as a target? Isn’t it kind of a corner case, because playing “Jeopardy!,” you have a lot of specialized rules. You have a lot of terminology, you have puns, you have plays on words, and things like that. And you say, “Well, you know, is it generally applicable, right?” But on the other hand, what the “Jeopardy!” goal—winning “Jeopardy!” with a machine—does is it synthesizes a lot of goals into a very visible and easily understood target for both the development team and for IBM management. If you talk to a Dave Ferrucci or Eric Brown, who are protagonists in the case, they would say “Jeopardy!” is this kind of broad, open-domain type question they’re answering, right? And picking that as the target would force you into the development of very generalizable methods, right? And “Jeopardy!” uses very complex language; it requires high contextual understanding, and it has to rely on often imprecise constructs, right? So in order to be able to accurately assess confidence in the answers, you’d have to have a pretty capable system, right? So this is kind of a harder target than just doing something that was very orderly. You know, you had to be fast; it was a very competitive situation.

So as a product development target, it had a certain appeal. There were some other aspects of it as well. First of all committing to play the best humans in a live contest on television provides a lot of pressure on the development team. It’s a very easily understood target. It kind of defines the schedule for you, and it’s kind of a grand challenge problem, and it makes it very easy for the whole team to rally around. It had this additional appeal of committing management to it, because once you publicly state you’re going to do something like that, management’s committed and it really rallies the team around making that target. It synthesizes a lot of things into an easily understood goal. So from a product development strategy standpoint, it was an interesting approach.

Kenny: Can we dig into Watson a little bit? How would you define what Watson is?

Shih: Watson is a question-answering system that is made up of many different subsystems, which will evaluate many different possible answers to the questions, and then determine its confidence level in proposing the right answer.

Kenny: And they ran into some interesting challenges with Watson—you know, the Toronto response being one. But the underlying problems there had to do with the way humans process this kind of information and helping Watson to process it in a similar way.

Shih: Well, yeah, and I think that’s one of the challenges because IBM’s interest here was really in natural language processing. And natural language as you and I use it is filled with a lot of ambiguities. In fact in the case of “Jeopardy!” it is probably one of the more difficult cases of having to sort out ambiguities because the game features such heavy uses of plays on words and puns and so on. They had to design a system that could handle this very sophisticated interpretation of what the question was, and then figure out through the use of a lot of data what the possible answers were, and then propose the best answer.

Kenny: And it turns out, I’m going to oversimplify this, but junk in, junk out, right? I mean, the quality of the content turned out to be critically important.

Shih: When Watson was going to be playing live contestants, it could not be connected to the Internet. Therefore, the IBM team had to really have a very broad search on their data sources, and then be able to assess the different probabilities for the different answers and create a confidence score that would tell them how much confidence Watson had in the answer.

Kenny: And the notion of confidence in a machine is an interesting thing. I don’t know that many people would think about whether your computer is confident in the information it’s providing you.

Shih: And I think that was one of the very interesting innovations in this program as they developed it, which was the whole notion of rendering explicit the level of confidence that the machine had in the answer. You and I, when we answer in class, do the same thing subconsciously. We have some notion of how confident we are in the answer, and we will make that judgment. In the case of Watson, it actually rendered that explicit.

Kenny: Tell us a little bit about Dave and his team. What kinds of challenges did he encounter managing a team of scientists and engineers?

Shih: Well, I think Dave had a very interesting situation because he had been with this team at IBM Research for a long time, and IBM Research has a very strong heritage in doing the basic science work, if you will. So they had been going along for many years on natural language processing. He came to the realization that the team had kind of leveled out in the progress that they were making. And I think the key insight for Dave Ferrucci was recognizing that their progress had stalled, and how to break out to the next level of innovation. And a lot of that revolved around picking this visible public target, and then changing the organization’s way of working so that they could kind of break out of that stall.

Kenny: There’s a lesson there for people who are leading creative teams or teams that are charged with R&D in any industry. This applies broadly.

Shih: Right, and I think it’s really about how do you generate breakthroughs? How do you recognize when the established way of working has taken you as far as you can, and you need to do something different?

Kenny: There’s a great scene in the case where you describe Dave bringing the team into his office one-by-one. Can you walk us through that?

Shih: One of the things Ferrucci realized was that people were working in a very incremental way, and they were working at their own silos. In the case, Ferrucci talks about people coming into his office and everybody explaining, “Well, I talked to somebody else, and they told me this,” and Ferrucci would pick up the phone and call them, and said, “No, wait a minute—can you come into my office?” And the next thing you knew, everybody was in Ferrucci’s office.

When he came to the realization that maybe the way they were organized was inhibiting communications across the team, and what in fact they really needed to do was facilitate that communication—what they then did was move everybody into the same room. It changed the organizational structure, and changed the pattern of communication among people to speed that up. And that turned out to be a step function change in their progress, because after they did that, they really started bringing up a lot of fresh ideas. And it goes to this notion of innovation being driven by cross-boundary interactions. So once he broke down all those walls, it really helped their communication a lot.

Kenny: It seems so logical, yet I think so many organizations—and I can speak from personal first-hand experience—breaking down silos is a difficult thing to do.

Shih: Well, many people who study innovation will say the best innovations are always cross-boundary. We kind of all know that, but it’s so easy to fall into your established ways of working, and it’s so difficult to make those changes. Because most people are relatively conservative about changing their way of working.

Kenny: Has this had any impact more broadly at IBM and how they think about structuring teams that do this kind of work?

Shih: I think it has, because it was a good example of kind of a breakout strategy, both the way of working, the kind of cross-boundary interaction, and kind of this moonshot approach of picking big, ambitious targets, publicly committing to them, and then going for them.

Kenny: We had the great opportunity to have Watson come here to HBS a couple of years ago and take on some students from Harvard Business School and from the Sloan School at MIT. That was an exciting moment. You must have enjoyed that.

Shih: Yes, and actually the original genesis for this case is my friend at IBM, who’s now the Chief Technical Officer of the company when Watson first came out. His name is Bernie Meyerson, and he called me up and said, “Willy, we’d like to offer a Watson contest at the School.” And my reaction to Bernie is, I said, “Well, Bernie, that would be fun, but we’ll actually learn a lot more if what we do is first look at the product development case, have the students go through that, and then we’ll have the contest.” When the time came for the contest, I was already familiar with how well Watson was doing. So I negotiated a deal with the IBM team. I said, “You want to have Watson pitted against an HBS student and a Sloan student; let’s make it a little more of an even match. How about if you allow us to have teams?” So we had a team of three HBS students, and MIT Sloan offered up a team of three students. So it was three versus three versus Watson, the machine. And the HBS students did a very good job.

Kenny: They had a lead for a fleeting moment there, didn’t they?

Shih: The HBS team, to their credit, two of them had actually been on “Jeopardy!.” So starting about a month before the actual contest, they got together and started practicing and rehearsing, and because they did that preparation, they had a strategy for playing this game, and they understood the game well enough that they realized the criticality of buzzing in quickly. So there was a brief moment where the HBS team had a lead. I was sitting next to a whole rank of IBM executives who had come in for the contest, and I can assure you, they were sweating for a moment, thinking, “Oh my goodness; the HBS team could beat Watson.” But then when it came to the final betting, Watson did pass them a little bit. I’m proud to say the HBS team represented the School very well.

Kenny: That’s good. The house was rocking, so to speak, when Watson was there. Does this have implications for other kinds of machines that we interact with?

Shih: Well, I think Watson is emblematic of a revolution that we’re seeing in terms of the use of big data and machine learning as decision support aids for humans. Where Watson is going is actually of great interest to many of us here at HBS because we’re looking at the applications of big data and this cognitive computing and machine learning in other fields, particularly areas where you have a lot of unstructured data like healthcare. These machine learning techniques are going to have very important implications for helping humans understand areas where there are vast amounts of data that have to be mined with some kind of cognitive processing and some kind of machine learning approach.

Kenny: But Watson will never take over the Earth, right? You know, take over all of our machines? People might be afraid of this kind of intelligence that’s amassed in Watson.

Shih: Well, I think in the end, these things will be assistants to people. These machines will be tools that will allow us to exploit the knowledge that is contained within large amounts of data. I’m actually not worried about them taking over.

Kenny: You heard it from Willy Shih. Thank you for joining us today.

Shih: Thank you.

Kenny: You can find this case, along with thousands of others in the Harvard Business School Case Collection at HBR.org. I’m Brian Kenny. Thanks for listening to Cold Call, the official podcast of Harvard Business School.

 Read more

Podcast Transcript

Brian Kenny: And now for the daily double. This contestant holds the highest two-day score ever recorded on the quiz show “Jeopardy!” The answer is Watson. Oh, and by the way, he’s not human. Watson is a deep question answering machine created by IBM who was introduced to the world in a nationally televised “Jeopardy!” exhibition, which he won handily. But he wasn’t always so smart. Today we’ll hear from Professor Willy Shih about his case study Building Watson: Not So Elementary, My Dear. I’m your host Brian Kenny, and you’re listening to Cold Call.

Professor Shih, an expert in manufacturing and product development, has written numerous articles, cases, and papers on the topic. He also spent 28 years in industry at some of the largest technology firms in the world, including of course IBM, which brings us to the case at hand. Willy, thanks for joining us.

Willy Shih: My pleasure.

Kenny: Can you start by telling us how this case opens? What’s the opening scenario here?

Shih: Well, the opening of the case is Dave Ferrucci, who is the protagonist in the case, sitting in the audience at that “Jeopardy!” contest where Watson was pitted against the all-time human champions in the game. So, the case opens with a final “Jeopardy!” question for Game 1 in the category U.S. Cities. The question was, “Its largest airport is named for a World War II hero; the second largest for a World War II battle.” Now, Dave Ferrucci is in the audience watching his machine answer, “What is Toronto?” which of course is the wrong answer. And Ferrucci looks to Eric Brown, who’s another one of the protagonists in the case, and he immediately realizes he’s going to have a lot of questions that he has to answer to management as to how the machine could come up with such an obviously wrong answer in this high-stakes contest.

Kenny: This is an unusual kind of case for HBS. There’s a lot of technology elements to it, and I’m wondering what prompted you to write it? What interested you in it?

Shih: Well, the case is fundamentally about product development, and how do you do product development in some of these very technology-intensive areas where you have very ambitious goals. What prompted us to write the case was, looking at the choice of “Jeopardy!” as a target for this design—was it a good choice or was it a bad choice? And what might that tell us about complex product development processes?

Kenny: You spent a lot of time at IBM—I’d like to know how does an idea like Watson ever get started there? And what’s in it for them?

Shih: Well, I think that’s one of the real questions in the case, because on the one hand, you look at Watson, and you say, “Well, is this really a directly commercializable product?” You know, why would somebody in the research division pick something like this as a target? Isn’t it kind of a corner case, because playing “Jeopardy!,” you have a lot of specialized rules. You have a lot of terminology, you have puns, you have plays on words, and things like that. And you say, “Well, you know, is it generally applicable, right?” But on the other hand, what the “Jeopardy!” goal—winning “Jeopardy!” with a machine—does is it synthesizes a lot of goals into a very visible and easily understood target for both the development team and for IBM management. If you talk to a Dave Ferrucci or Eric Brown, who are protagonists in the case, they would say “Jeopardy!” is this kind of broad, open-domain type question they’re answering, right? And picking that as the target would force you into the development of very generalizable methods, right? And “Jeopardy!” uses very complex language; it requires high contextual understanding, and it has to rely on often imprecise constructs, right? So in order to be able to accurately assess confidence in the answers, you’d have to have a pretty capable system, right? So this is kind of a harder target than just doing something that was very orderly. You know, you had to be fast; it was a very competitive situation.

So as a product development target, it had a certain appeal. There were some other aspects of it as well. First of all committing to play the best humans in a live contest on television provides a lot of pressure on the development team. It’s a very easily understood target. It kind of defines the schedule for you, and it’s kind of a grand challenge problem, and it makes it very easy for the whole team to rally around. It had this additional appeal of committing management to it, because once you publicly state you’re going to do something like that, management’s committed and it really rallies the team around making that target. It synthesizes a lot of things into an easily understood goal. So from a product development strategy standpoint, it was an interesting approach.

Kenny: Can we dig into Watson a little bit? How would you define what Watson is?

Shih: Watson is a question-answering system that is made up of many different subsystems, which will evaluate many different possible answers to the questions, and then determine its confidence level in proposing the right answer.

Kenny: And they ran into some interesting challenges with Watson—you know, the Toronto response being one. But the underlying problems there had to do with the way humans process this kind of information and helping Watson to process it in a similar way.

Shih: Well, yeah, and I think that’s one of the challenges because IBM’s interest here was really in natural language processing. And natural language as you and I use it is filled with a lot of ambiguities. In fact in the case of “Jeopardy!” it is probably one of the more difficult cases of having to sort out ambiguities because the game features such heavy uses of plays on words and puns and so on. They had to design a system that could handle this very sophisticated interpretation of what the question was, and then figure out through the use of a lot of data what the possible answers were, and then propose the best answer.

Kenny: And it turns out, I’m going to oversimplify this, but junk in, junk out, right? I mean, the quality of the content turned out to be critically important.

Shih: When Watson was going to be playing live contestants, it could not be connected to the Internet. Therefore, the IBM team had to really have a very broad search on their data sources, and then be able to assess the different probabilities for the different answers and create a confidence score that would tell them how much confidence Watson had in the answer.

Kenny: And the notion of confidence in a machine is an interesting thing. I don’t know that many people would think about whether your computer is confident in the information it’s providing you.

Shih: And I think that was one of the very interesting innovations in this program as they developed it, which was the whole notion of rendering explicit the level of confidence that the machine had in the answer. You and I, when we answer in class, do the same thing subconsciously. We have some notion of how confident we are in the answer, and we will make that judgment. In the case of Watson, it actually rendered that explicit.

Kenny: Tell us a little bit about Dave and his team. What kinds of challenges did he encounter managing a team of scientists and engineers?

Shih: Well, I think Dave had a very interesting situation because he had been with this team at IBM Research for a long time, and IBM Research has a very strong heritage in doing the basic science work, if you will. So they had been going along for many years on natural language processing. He came to the realization that the team had kind of leveled out in the progress that they were making. And I think the key insight for Dave Ferrucci was recognizing that their progress had stalled, and how to break out to the next level of innovation. And a lot of that revolved around picking this visible public target, and then changing the organization’s way of working so that they could kind of break out of that stall.

Kenny: There’s a lesson there for people who are leading creative teams or teams that are charged with R&D in any industry. This applies broadly.

Shih: Right, and I think it’s really about how do you generate breakthroughs? How do you recognize when the established way of working has taken you as far as you can, and you need to do something different?

Kenny: There’s a great scene in the case where you describe Dave bringing the team into his office one-by-one. Can you walk us through that?

Shih: One of the things Ferrucci realized was that people were working in a very incremental way, and they were working at their own silos. In the case, Ferrucci talks about people coming into his office and everybody explaining, “Well, I talked to somebody else, and they told me this,” and Ferrucci would pick up the phone and call them, and said, “No, wait a minute—can you come into my office?” And the next thing you knew, everybody was in Ferrucci’s office.

When he came to the realization that maybe the way they were organized was inhibiting communications across the team, and what in fact they really needed to do was facilitate that communication—what they then did was move everybody into the same room. It changed the organizational structure, and changed the pattern of communication among people to speed that up. And that turned out to be a step function change in their progress, because after they did that, they really started bringing up a lot of fresh ideas. And it goes to this notion of innovation being driven by cross-boundary interactions. So once he broke down all those walls, it really helped their communication a lot.

Kenny: It seems so logical, yet I think so many organizations—and I can speak from personal first-hand experience—breaking down silos is a difficult thing to do.

Shih: Well, many people who study innovation will say the best innovations are always cross-boundary. We kind of all know that, but it’s so easy to fall into your established ways of working, and it’s so difficult to make those changes. Because most people are relatively conservative about changing their way of working.

Kenny: Has this had any impact more broadly at IBM and how they think about structuring teams that do this kind of work?

Shih: I think it has, because it was a good example of kind of a breakout strategy, both the way of working, the kind of cross-boundary interaction, and kind of this moonshot approach of picking big, ambitious targets, publicly committing to them, and then going for them.

Kenny: We had the great opportunity to have Watson come here to HBS a couple of years ago and take on some students from Harvard Business School and from the Sloan School at MIT. That was an exciting moment. You must have enjoyed that.

Shih: Yes, and actually the original genesis for this case is my friend at IBM, who’s now the Chief Technical Officer of the company when Watson first came out. His name is Bernie Meyerson, and he called me up and said, “Willy, we’d like to offer a Watson contest at the School.” And my reaction to Bernie is, I said, “Well, Bernie, that would be fun, but we’ll actually learn a lot more if what we do is first look at the product development case, have the students go through that, and then we’ll have the contest.” When the time came for the contest, I was already familiar with how well Watson was doing. So I negotiated a deal with the IBM team. I said, “You want to have Watson pitted against an HBS student and a Sloan student; let’s make it a little more of an even match. How about if you allow us to have teams?” So we had a team of three HBS students, and MIT Sloan offered up a team of three students. So it was three versus three versus Watson, the machine. And the HBS students did a very good job.

Kenny: They had a lead for a fleeting moment there, didn’t they?

Shih: The HBS team, to their credit, two of them had actually been on “Jeopardy!.” So starting about a month before the actual contest, they got together and started practicing and rehearsing, and because they did that preparation, they had a strategy for playing this game, and they understood the game well enough that they realized the criticality of buzzing in quickly. So there was a brief moment where the HBS team had a lead. I was sitting next to a whole rank of IBM executives who had come in for the contest, and I can assure you, they were sweating for a moment, thinking, “Oh my goodness; the HBS team could beat Watson.” But then when it came to the final betting, Watson did pass them a little bit. I’m proud to say the HBS team represented the School very well.

Kenny: That’s good. The house was rocking, so to speak, when Watson was there. Does this have implications for other kinds of machines that we interact with?

Shih: Well, I think Watson is emblematic of a revolution that we’re seeing in terms of the use of big data and machine learning as decision support aids for humans. Where Watson is going is actually of great interest to many of us here at HBS because we’re looking at the applications of big data and this cognitive computing and machine learning in other fields, particularly areas where you have a lot of unstructured data like healthcare. These machine learning techniques are going to have very important implications for helping humans understand areas where there are vast amounts of data that have to be mined with some kind of cognitive processing and some kind of machine learning approach.

Kenny: But Watson will never take over the Earth, right? You know, take over all of our machines? People might be afraid of this kind of intelligence that’s amassed in Watson.

Shih: Well, I think in the end, these things will be assistants to people. These machines will be tools that will allow us to exploit the knowledge that is contained within large amounts of data. I’m actually not worried about them taking over.

Kenny: You heard it from Willy Shih. Thank you for joining us today.

Shih: Thank you.

Kenny: You can find this case, along with thousands of others in the Harvard Business School Case Collection at HBR.org. I’m Brian Kenny. Thanks for listening to Cold Call, the official podcast of Harvard Business School.