Jeffrey K. Hollingsworth,
Finding Bottlenecks in Large-scale Parallel Programs,
Note: this paper contains several color pages.
This thesis addresses the problem of trying to locate the source of performance bottlenecks in large-scale parallel and distributed applications. Performance monitoring creates a dilemma: identifying a bottleneck necessitates collecting detailed information, yet collecting all this data can introduce serious data collection bottlenecks. At the same time, users are being inundated with volumes of complex graphs and tables that require a performance expert to interpret. I have developed a new approach that addresses both these problems by combining dynamic on-the-fly selection of what performance data to collect with decision support to assist users with the selection and presentation of performance data. The approach is called the W3 Search Model (pronounced W-cubed). To make it possible to implement the W3 Search Model, I have developed a new monitoring technique for parallel programs called Dynamic Instrumentation. The premise of my work is that not only is it possible to do on-line performance debugging, but for large scale parallelism it is mandatory.
The W3 Search Model closes the loop between data collection and analysis. Searching for a performance problem is an iterative process of refining the answers to three questions: why is the application performing poorly, where is the bottleneck, and when does the problem occur. To answer the why question, tests are conducted to identify the type of bottleneck (e.g., synchronization, I/O, computation). Answering the where question isolates a performance bottleneck to a specific resource used by the program (e.g., a disk system, a synchronization variable, or a procedure). Answering when a problem occurs, tries to isolate a bottleneck to a specific phase of the program's execution.
Dynamic Instrumentation differs from traditional data collection because it defers selecting what data to collect until the program is running. This permits insertion and alteration of the instrumentation during program execution. It also features a new type of data collection that combines the low data volume of sampling with the accuracy of tracing. Instrumentation to precisely count and time events is inserted by dynamically modifying the binary program. These counters and timers are then periodically sampled to provide intermediate values to the W3 Search Model. Based on this intermediate data, changes are made in the instrumentation to collect more information to further isolate the bottleneck.
I have built a prototype implementation of the W3 Search Model and of Dynamic Instrumentation. The prototype runs on Thinking Machine's CM-5 and a network of workstations running PVM. In one study, the tools identified the bottlenecks in several real programs using two to three orders of magnitude less data than traditional techniques. In another study, Dynamic Instrumentation was used to monitor long running programs and introduced less than 10% perturbation. While the W3 Search Model and Dynamic Instrumentation complement each other, they are also useful individually. The W3 Search Model can be applied to existing post-mortem performance tools or even to simulated machines and environments. Dynamic Instrumentation has be used to collect performance data for other uses including visualization. The W3 Search Model and Dynamic Instrumentation are being incorporated into the Paradyn Performance Tools.