Better-reply Dynamics in Deferred Acceptance Games
Executive Summary — There's an inherent problem in the market design theory known as mechanism design, in that the players in the market may not understand the design, and thus may make bad choices until they learn to work the system better. This paper explores the issue of learning the design. It focuses on a particular mechanism, the Deferred Acceptance algorithm for two-sided matching markets, which is used in many real-life markets. Research was conducted by Guillaume Haeringer of Universitat Autonoma de Barcelona and Hanna Halaburda of Harvard Business School. Key concepts include:
- In the Deferred Acceptance algorithm, matches are made in a series of rounds, until everyone is matched up. The matching achieved through DA has a special property of "stability." In a stable matching, if an individual tries for a better choice than the one initially assigned by the matching, he learns that his ideal choice is already taken—matched to someone more preferred than he is.
- The researchers discuss several possibilities of "better-reply dynamics," in which savvy players figure out what is the optimal strategy for getting the best possible match.
- They find that even in simple two-sided matching models, the learning is difficult and takes a long time.
In this paper we address the question of learning in a two-sided matching mechanism that utilizes the deferred acceptance algorithm. We consider a repeated matching game where at each period agents observe their match and have the opportunity to revise their strategy (i.e., the preference list they will submit to the mechanism). We focus in this paper on better-reply dynamics. To this end, we first provide a characterization of better-replies and a comprehensive description of the dominance relation between strategies. Better-replies are shown to have a simple structure and can be decomposed into four types of changes. We then present a simple better-reply dynamics with myopic and boundedly rational agents and identify conditions that ensure that limit outcomes are outcome equivalent to the outcome obtained when agents play their dominant strategies. Better-reply dynamics may not converge, but if they do converge, then the limit strategy profiles constitute a subset of the Nash equilibria of the stage game.