Developing Theory Using Machine Learning Methods

by Prithwiraj Choudhury, Ryan Allen, and Michael G. Endres

Overview — This paper provides a step-by-step roadmap for using machine learning (ML) techniques to explore novel and robust patterns in data. It introduces management researchers to a new use case for ML tools: building new theory from quantitative observational data.

Author Abstract

We describe how to employ machine learning methods in theory development. Compared to traditional causal inference methods, ML methods make far fewer a priori assumptions about the functional form of the underlying model that best represents the data. Given this, researchers could use such methods to explore novel and robust patterns in the data that could lead to inductive theory building. ML strengths include replicable identification of novel patterns in the data. Additionally, ML methods address several concerns (such as “p-hacking” and confounding local effects for global effects) raised by scholars relative to the norms of empirical research in the fields of strategy and management. We develop a step-by-step roadmap that illustrates how to use four ML methods (decision trees, random forests, K-nearest neighbors, and neural networks) to reveal patterns in data that could be used for theory building. We also illustrate how ML methods could better illuminate interactions and non-linear effects, relative to traditional methods. In summary, ML methods could act as a complementary tool to both existing inductive theory-creating methods such as multiple case inductive studies and traditional methods of causal inference.

Paper Information