 |
MRNet: A Multicast/Reduction Network |
 |
News and Recent Developments
May 2009 - Version 2.1 has been released. See below for details.
A recent paper on scalable TBON reliability: "State Compensation: A Scalable Failure Recovery Model for Tree-based Overlay Networks" by Arnold and Miller.
A recent paper on using TBONs for extreme-scale debugging:
"Lessons Learned at 208K: Towards Debugging Millions of Cores"
by Lee, Ahn, Arnold, Supinski, Legendre, Miller, Schulz, and Liblit,
Supercomputing 2008 (SC2008),
Austin, TX, in November 2008.
Overview
MRNet is a software overlay network that provides efficient multicast
and reduction communications for parallel and distributed tools and systems.
MRNet uses a tree of processes between the tool's front-end and back-ends to
improve group communication performance. These internal processes are also
used to distribute many important tool activities, reducing
data analysis time and keeping tool front-end loads manageable.
MRNet-based tool components communicate across logical channels called
streams. At MRNet internal processes, filters are bound to these streams to
synchronize and aggregate dataflows. Using filters, MRNet can efficiently
compute averages, sums, and other more complex aggregations and analyses
on tool data. MRNet also supports facilities that allow tool developers
dynamically load new tool-specific filters into the system.
Key Features:
- Flexible Organization: MRNet process tree organization is specified in a
configuration file that can specify common network layouts like k-ary and
k-nomial trees, or custom layouts tailored to the system(s) running the tool.
- Scalable, Flexible Data Aggregation: MRNet's built-in filters provide
efficient computation of averages, sums, concatenation, and other common data
reductions. Custom filters can be loaded dynamically into the network to perform
tool-specific aggregation operations.
- High-bandwidth Communication: MRNet transfers data within the
tool system using an efficient, packed binary representation. Zero-copy data
paths are used whenever possible to reduce the cost of transferring data through
internal processes.
- Scalable Multicast: MRNet supports efficient message multicast
to reduce the cost of issuing control requests from the tool front-end to its
back-ends.
- Multiple Concurrent Data Channels: MRNet supports multiple
logical streams of data between tool components. Data aggregation and message
multicast takes place within the context of a data stream, and multiple
operations (both upward and downward) can be active simultaneously.
- Open Source Licensing.
Software and Manuals
MRNet v2.1 highlights
- New API for Stream performance data collection
- New support for heterogeneous Stream filters
- Intel and Portland Group compiler support
- Improved topology generation facility
- Numerous bug fixes and enhancements
Documentation
- README
- The Multicast/Reduction Network: A User's Guide to MRNet v2.1
(HTML )
Distribution
- MRNet 2.1 Source (gzipped tarball)
- MRNet 2.1 Windows Binaries (available upon request)
MRNet Version 2.0, July 2008.
MRNet Version 1.2, March 2007.
MRNet Version 1.1, April 2005.
MRNet Version 1.0, September 2003.
Publications
UW Publications
- Dorian C. Arnold and Barton P. Miller, "State Compensation: A Scalable Failure Recovery Model for Tree-based Overlay Networks", Under submission. [ PDF ]
- Dorian C. Arnold, Gary D. Pack and Barton P. Miller, "Tree-based Overlay Networks for Scalable Applications", 11th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2006), Rhodes, Greece, April 2006. [ PDF ]
- Philip C. Roth and Barton P. Miller, "On-line Automated Performance Diagnosis on Thousands of Processes", ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06), New York City, March 2006. [ PDF ]
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, "Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned", 2004 High-Performance Grid Computing Workshop, Santa Fe, New Mexico, April 2004. [ PDF ]
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, "MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools", SC2003, Phoenix, Arizona, November 2003. [ PDF ]
- MRNet Poster. SC2004, Pittsburgh, PA November 2004. [ PDF ]
Joint/External Publications
- Gregory L. Lee, Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Matthew Legendre,
Barton P. Miller, Martin Schulz, and Ben Liblit,
"Lessons Learned at 208K: Towards Debugging Millions of Cores",
Supercomputing 2008 (SC2008), Austin, TX, November 2008.
[ PDF ]
- Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Gregory Lee,
Barton P. Miller, and Martin Schulz,
"Overcoming Scalablility Challenges for Tool Daemon Launching",
37th International Conference on Parallel Processing (ICPP-08),
Portland, Oregon, September, 2008.
[ PDF ]
- Aroon Nataraj Allen D. Malony, Alan Morris, Dorian C. Arnold and Barton P. Miller,
"In Search of Sweet-Spots in Parallel Performance Monitoring", IEEE Cluster 2008,
Tsukuba, Japan, September 2008.
[ PDF ]
- Aroon Nataraj Allen D. Malony, Alan Morris, Dorian C. Arnold and Barton P. Miller,
"A Framework for Scalable, Parallel Performance Monitoring using TAU and MRNet",
International Workshop on Scalable Tools for High-End Computing (STHEC 2008),
Island of Kos, Greece, June 2008.
[ PDF ]
- Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski, Gregory Lee, Barton P. Miller, and Martin Schulz, "Stack Trace Analysis for Large Scale Applications", International Parallel & Distributed Processing Symposium, Long Beach, California, March 2007. [ PDF ]
- Martin Schulz, Dong Ahn, Andrew Bernat, Bronis R. de Supinski, Steven Y. Ko, Gregory Lee, and Barry Rountree, "Scalable Dynamic Binary Instrumentation for Blue Gene/L." ACM SIGARCH Computer Architecture News 33(5), pp. 9-14, December, 2005.
Contact Information:
Paradyn Project
Computer Sciences Department
University of Wisconsin
1210 West Dayton Street
Madison, WI 53706
E-mail: paradyn@cs.wisc.edu
FAX: +1 608-262-9777