 |
MRNet: A Multicast/Reduction Network |
 |
News and Recent Developments
July 2008 - Version 2.0 has been released. See below for details.
A recent paper on scalable TBON reliability: "State Compensation: A Scalable Failure Recovery Model for Tree-based Overlay Networks" by Arnold and Miller.
A recent paper on using TBONs for extreme-scale debugging: "Stack Trace Analysis for Large Scale Debugging" by Arnold, Ahn, de Supinski, Lee, Miller and Schulz will appear in IPDPS '07 in Long Beach, California in March, 2007.
Paradyn/MRNet Integration: As of Paradyn v5.0, MRNet has been integrated as the infrastructure scalable front-end/tool-daemon communication.
Overview
MRNet is a software overlay network that provides efficient multicast
and reduction communications for parallel and distributed tools and systems.
MRNet uses a tree of processes between the tool's front-end and back-ends to
improve group communication performance. These internal processes are also
used to distribute many important tool activities, reducing
data analysis time and keeping tool front-end loads manageable.
MRNet-based tool components communicate across logical channels called
streams. At MRNet internal processes, filters are bound to these streams to
synchronize and aggregate dataflows. Using filters, MRNet can efficiently
compute averages, sums, and other more complex aggregations and analyses
on tool data. MRNet also supports facilities that allow tool developers
dynamically load new tool-specific filters into the system.
Key Features:
- Flexible Organization: MRNet process tree organization is specified in a
configuration file that can specify common network layouts like k-ary and
k-nomial trees, or custom layouts tailored to the system(s) running the tool.
- Scalable, Flexible Data Aggregation: MRNet's built-in filters provide
efficient computation of averages, sums, concatenation, and other common data
reductions. Custom filters can be loaded dynamically into the network to perform
tool-specific aggregation operations.
- High-bandwidth Communication: MRNet transfers data within the
tool system using an efficient, packed binary representation. Zero-copy data
paths are used whenever possible to reduce the cost of transferring data through
internal processes.
- Scalable Multicast: MRNet supports efficient message multicast
to reduce the cost of issuing control requests from the tool front-end to its
back-ends.
- Multiple Concurrent Data Channels: MRNet supports multiple
logical streams of data between tool components. Data aggregation and message
multicast takes place within the context of a data stream, and multiple
operations (both upward and downward) can be active simultaneously.
- Open Source Licensing.
Software and Manuals
MRNet v2.0 highlights
- Fault tolerance and recovery for internal MRNet node failures
- Improved API for examining MRNet topology
- New filter capabilities such as dynamic configuration
- Vastly improved memory management
- Improved support for multi-threaded front-ends and back-ends
- Updated examples, including a sample Makefile
- Numerous bug fixes and enhancements
Documentation
- README
- The Multicast/Reduction Network: A User's Guide to MRNet v2.0
(HTML )
Distribution
MRNet Version 1.2, March 2007.
MRNet Version 1.1, April 2005.
MRNet Version 1.0, September 2003.
Publications
UW Publications
- Dorian C. Arnold and Barton P. Miller, "State Compensation: A Scalable Failure Recovery Model for Tree-based Overlay Networks", Under submission. [ PDF ]
- Dorian C. Arnold, Gary D. Pack and Barton P. Miller, "Tree-based Overlay Networks for Scalable Applications", 11th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2006), Rhodes, Greece, April 2006. [ PDF ]
- Philip C. Roth and Barton P. Miller, "On-line Automated Performance Diagnosis on Thousands of Processes", ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06), New York City, March 2006. [ PDF ]
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, "Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned", 2004 High-Performance Grid Computing Workshop, Santa Fe, New Mexico, April 2004. [ PDF ]
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, "MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools", SC2003, Phoenix, Arizona, November 2003. [ PDF ]
- MRNet Poster. SC2004, Pittsburgh, PA November 2004. [ powerpoint | PDF ]
Joint/External Publications
- Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski, Gregory Lee, Barton P. Miller, and Martin Schulz, "Stack Trace Analysis for Large Scale Applications", International Parallel & Distributed Processing Symposium, Long Beach, California, March 2007. To appear. [ PDF ]
- Martin Schulz, Dong Ahn, Andrew Bernat, Bronis R. de Supinski, Steven Y. Ko, Gregory Lee, and Barry Rountree, "Scalable Dynamic Binary Instrumentation for Blue Gene/L." ACM SIGARCH Computer Architecture News 33(5), pp. 9-14, December, 2005.
Contact Information:
Paradyn Project
Computer Sciences Department
University of Wisconsin
1210 West Dayton Street
Madison, WI 53706

Phone: (608) 262-6227
FAX: (608) 262-9777