Hidden Structure: Using Network Methods to Map System Architecture

by Carliss Y. Baldwin, Alan MacCormack & John Rusnak

Overview — All complex systems can be described in terms of their architecture, that is, as a nested hierarchy of subsystems. Despite a wealth of research highlighting the importance of understanding system architecture, however, there is little empirical evidence on the actual architectural patterns observed across large numbers of real world systems. In this paper, the authors developed robust and reliable methods to detect the core components in a complex system, to establish whether these systems possess a core-periphery structure, and to measure important elements of these structures. Overall, the findings represent a first step in establishing some stylized facts about the structure of real-world systems. Key concepts include:

  • The majority of systems analyzed in this non-random sample—67 percent to 76 percent—possess a core-periphery structure. Another 20 percent are considered borderline core-periphery. However, a significant number of systems lack such a structure. This implies a considerable amount of managerial discretion exists when choosing the "best" architecture for a system.
  • There are major differences in the number of core components across a range of systems of similar size and function, indicating that differences in design are not driven solely by system requirements.
  • Instead, these differences appear to be driven, in part, by the characteristics of the organization in which development occurs. Open, distributed organizations tend to develop modular designs with smaller "Cores"; whereas closed, collocated organizations tend to develop tightly-coupled designs with larger Cores.
  • The authors find that core components are often distributed throughout a system, rather than being concentrated in one place. Hence it is important not to assume that all key relationships in a system are located in a few subsystems. These issues are pertinent in software, given that legacy code is rarely re-written, but instead forms a platform upon which new systems are built.
  • The authors find no discernible pattern of direct dependencies between components that can reliably predict the number and location of core components. The results highlight the critical importance of indirect dependencies, which generate multiple paths along which changes and problems can propagate. These findings highlight the difficulties facing a system architect.

Author Abstract

In this paper, we describe an operational methodology for characterizing the architecture of technical systems and demonstrate its application to a large sample of software releases. Our methodology is based upon network graphs and allows us to identify and define three fundamental architectural patterns, which we call core-periphery, multi-core, and hierarchical. We apply our methodology to a sample of 1,286 software releases from 17 applications and find that 70% to 80% of these systems possess a core-periphery architecture under our classification scheme. This type of architecture is characterized by having a single dominant cyclic group (the Core) that is large relative to other cyclic groups and above a threshold with respect to system size. We find that the size of the Core varies widely, even for systems that perform the same function. These differences appear to be associated with different models of development-open, distributed organizations tend to develop systems with smaller Cores, while closed collocated organizations tend to develop systems with larger Cores. Our findings represent a first step in establishing some "stylized facts" about the fine-grained structure of large, real-world technical systems.

Paper Information