Date: Tue, 7 Apr 1998 11:29:59 -0400 (EDT) From: hollings@cs.umd.edu (Jeff Hollingsworth) Subject: [DynInst_API:] Meeting notes -------------------------------------------------- DYNINST/DAIS STANDARDS JOURNAL #1 -------------------------------------------------- DYNINST API/DAIS STANDARDS: ORGANIZATIONAL MEETING Wednesday, March 18, 1998 Attendees: Antonio Espinosa (Autonomous U. of Barcelona), Arndt Bode (TU Munich), Brond Larson (SGI/Cray), Shirley Browne (U. Tennessee), Jeff Brown (LANL), Bryan Buck (U. Maryland), Erica Dorenkamp (Sun Microsystems) Carlos Figueira (U. of Simon Bolivar and U. Wisconsin), Noam Freedman (Argonne NL), Jeff Hollingsworth (U. Maryland), Robert Hood (NASA Ames), Chris Kerr (Geophysical Fluid Dynamics Lab), Emilio Luque (Autonomous U. of Barcelona), Tomas Margalef (Autonomous U. of Barcelona), Bart Miller (U. Wisconsin), Philip Mucci (U. Tennessee/Sandia NL), Oscar Naim (U. Wisconsin), Doug Pase (IBM - SP-2 Group), John Ranelletti (LLNL), Aaron Sawdey (SGI/Cray), Sunlung Suen (U. Wisconsin), Joerg Trintis (TU Munich), Brian Toonen (Argonne NL), Jeff Vetter (UIUC), Brian Wylie (U. Wisconsin), Mary Zosel (LLNL), I. Process Issues Issue: What is the form of participation? Proposal: Structure the process like the X consortium. Anyone can attend, but voting will be limited to those organizations who either provide resources in money (e.g. $30K to Wisc. or Maryland) or 1/4 FTE. But be assured that participation is encouraged regardless. Comments: Doug P: Building the low-level implementation is not the only way to participate. It is just as important to have people working on the tools level. The whole intention is to have it possible to create real tools. We don't want an impression that we're trying to exclude somebody, but we do want to avoid someone coming in and "scuttling the ship" without being part of the crew. Bart: The meetings will remain open but you only get to vote if you are providing resources. IBM has committed resources in the form of several FTEs. Wisconsin/Maryland have both committed resources. It is important to make sure that the infrastructure does get done. Otherwise the high level tools aren't any use. Jeff B: Would like to see SGI adopt this effort, but are willing to help get that process going. They are willing to have 1 FTE working on this as long as Doug P. (IBM) continues to be interested in this. Robert H: Looking for tool infrastructure, would expect to be an active participant, some uncertainty. (can't officially speak for NASA) SGI: They are here to observe with hopes to leverage or provide if appropriate, but don't understand it well enough yet. Mary Z: As long as IBM remains interested in this, LLNL will be active. Shirley B: UT is involved with DOD mod and is interested in portable tools. They can get resources to work on projects with specific deliverables that benefit the PET/DOD users. Anything that they provide resources for needs to benefit the application developers. For PET to put resources into this, we would have to show specific deliverables, tools. They are not interested in infrastructure as a deliverable. The main focus is not creating tools, but rather porting, survey, and robustifying existing tools. Arndt B: The market of SC users is so small that there is no need to have disjoint tools. We need to make them more portable. We have our own approach - and want to see how these evolve. We are interested in seeing how dyninst and DAIS are evolving. Could contribute to the effort. We are a long distance away (Munich), so might not have regular contact. Brian T: Argonne's interest is currently passive. We are interested in adaptivity, so might be able to incorporate this into some agent. Question: What is the time table for this effort? Answer (Bart): Of course we don't have a good estimate right now. It depends on the resources available, and the scope. Hopefully in three months we will have a good idea of where we are going. Question: As an end user, is it obvious what you want as a minimal set in the API? Answer (Doug P): We will use the high level tools (e.g., Paradyn) as drivers. The tool requirements will drive API level development. Infrastructure in isolation is not very interesting. Question: What are the frequency of the meetings? Answer (Jeff H): We anticipate about every 3-6 months with teleconference more frequently. A good time for the next meeting is after the SPDT conference in Oregon during the first week in August. Comment (Mary Z): Regular teleconferences worked well with OpenMP effort (LLNL). However, using just email doesn't work out very well. Comment (Bart M): If there are other organizations who are not at this meeting, please feel free to contact them or let us know and we will contact them. Question (Jeff B): Is this effort like the National Compiler Infrastructure? Answer (Jeff H): They have funding source (DARPA) for that purpose. However, if there is sufficient interest and demand, we might consider submitting a proposal for funding the development of the reference implementation of the API. Question (Chris K): How will tool developers know what others are doing? Answer (Doug P): The mail reflector is an appropriate forum for experiences in tool development. Otherwise there is no specific plan for how to deal with this. II. Scope Issue: What are we trying to standardize? Proposal: A multi-layered model, with well-defined interfaces at the dyninst and DAIS levels. Dyninst is the stuff that has to be done on a single node, DAIS is the stuff that glues together multiple nodes. Dyninst: * Platform independent process instrumentation on a single node. * Platform independent process control functions on a single node. DAIS: * Platform independent extracting data from processes. * Multi-node, multi-tool support/RPC architecture, Security. * Scalability (by offloading work to the nodes) New Features and pieces outside dyninstAPI/DAIS: * Source browser * Expression parser * Name demangler * Clock sync package (distributed clocks) Comment (Bart M): When it comes to scalability and performance, the lesson from Paradyn is: You cannot make things asynchronous enough. Synchronous (blocking) behavior was never the right answer. It is important to design in from the start a model that is very asynchronous. Question (Arndt B): What are the target architectures? Where do heterogeneous things fit? Answer (Bart M): If designed properly into the RPC, heterogeneity is almost free. Answer (Jeff H): We believe that we will be able to continue to hide the architecture issues under the dyninst API. Because dyninst has machine independent abstractions, DAIS could work with multiple different platforms simultaneously. Comment (Mary Z): Heterogeneity is important for projects like the Computational Plant (A cluster system that is constantly evolving - nodes being added and removed). Issue: What are are the uses of the API? Suggestions were placed on the white board. The list included: debuggers, performance steering, performance tools (code, comm, and I/O), visualization, load balancing, Reliability/Availability/Servicability (RAS), test coverage, future systems design/simulation, Condor like systems: running on idle workstation systems. This currently requires linking with a special C library. Instead, we could use dyninst to "hijack" the job and change the C library to be the condor version and send it off to the condor queue. memory tools (perf, array bounds, ptr checks) checkpointing relative debugging: comparing the output of two different runs/versions of a program. Comment (Doug P): Let me explain RAS applications. For example, an application tries to save relevant data when it realizes it is going down, or there is a problem. Within the system, the RAS code can trigger a client application to help handle this situation. Question: Will the API will support debuggers and static analysis tools? Answer (Doug P): IBM is looking at putting debuggers on top of this API. We are not restricting DAIS to performance analysis tools. Issues related to source code get into the fuzzy area between the client and the infrastructure level. DAIS is not a debugger, but we want to provide infrastructure that can support a debugger and application steering, etc. Question: Why is application steering so interesting? Isn't this really a relatively small issue, related to moving large volumes of data out of the process. Answer: Comment (Jeff B): We should focus on the performance tool issues as driver, this seems to give us about 70% of the required functionality for all of the proposed applications. Comment (Mary Z): Source browsers are an important component. Should they be part of the DAIS standard, or are hooks that permit building source browsers sufficient? Comment (Bart M): There is a need for a library of useful functionality for tools. For example, name demangelers are an important part of the picture. We need to be able to translate from internal to external names (and back) for different languages, compilers, and platforms. Comment (Doug P): Expressions parsers are another item that are a useful common feature. Question (Jeff B): Can we use kernInst to help with checkpoint/restore? Answer (Bart M): KernInst is not ready for prime time yet. A poll was taken of what applications of the API the group felt were important. Everyone got up to three votes. The results were: performance tools (14), debuggers (8), memory tools (3), visualization (4), relative debugging (2), load balancing (1), future systems (1), RAS (1). Comment: This is a biased group. Many (maybe most) people here are performance tool builders or debuggers writers. Probably some of the others are subsets of these. III. Status of dyninstAPI A copy of the current draft dyninstAPI document was distributed. Comments (Jeff H): There are some features that are in the document and missing from the current reference implementation. The two most significant are block and loop level instrumentation and thread support. Question: How soon will the instruction level instrumentation described by Ari on Tuesday be available? Comment (Bart M): The prototype for fine-grained instrumentation is very early, and it will take quite a bit of work to get it ready for distribution. Question: How do you start up an application and take control of it? Answer (Jeff H): The API provides attach and process create methods. Question: How do you access a variable that is in memory on another node (perhaps in software DSM or that is part of an HPF distributed array)? Answer (Doug P): We don't plan to handle these language specific issues in the API, instead we will provide enough to read/write the local memory on a node and there will need to be mapping and access functions. A list of features not in the current API document, but that would be useful was put on the white board. It contained: - support for distributed environments - register state - what registers: perf counters - timing, pc, sp frame pointer, etc. - stack trace - some notions of breakpoint - and step and single step - symbol table information/source mapping information (anything you can get out of the symbol table without parsing the source) - compiler language and vendor (string representation of what compiler, etc.) - signals catching - floating point expressions - 32/64 bit - both ints and floats - basic structures / arrays - extract machine specific info (effective addr) - address as a base type for snippet expressions - bulk data transfer, perhaps with a filter function to return all values that meet a simple test (i.e. not zero, < 0.0001, etc.). - load code (e.g. dynamic linked library) -- will be implemented soon - dump what you think the state of the world is now (tools for debugging tool building) - simple string to AST Expr tree conversion routine. Question: Does dyninst or DAIS need to be just thread aware or specific thread-package aware? Answer (Jeff H): To allow snippets to only be active for a subset of the threads in an address space, thread-package specific instrumentation is required in the thread context switch code. Question: How are signals handled within this interface? Answer (Jeff H): A mutator process can select if a specific signal will stop this process and inform the mutator. If a mutator wishes to change the signal handling behavior within the application, it can use the oneShot interface to cause a new signal handler to be installed. Question: How can conditional break points that have arbitrary code be inserted and used? Answer (Jeff H): For simple expressions "inline" snippet can be generated. For more complex code, it might be possible to invoke the native compiler, have it produce a predicate function which the dynamic linker would load into the program and the snippet would be installed to call. Comment: That approach assumes that a compiler will be available on the nodes. In many systems, the compiler is only installed on the front-end node. Question: How will instrumentation of individual instructions be handled? Comment: ATOM has good support for instrumenting individual instructions. It also has a nice abstraction for instrumenting instructions, and computing the effective address of a load or store instruction. Question: How do rewriting and dynamic instrumentation fit together? Answer: Question: What is the status of the source code for the reference implementation? Answer (dyninst): Currently we make the source code freely available for non-profit uses which includes internal use by companies. Redistribution is this only thing that has a substantial restriction. Also, we have avoided using GNU Public Licensed code so far. Although there are hooks in the code that can plug into some gnu functionality such as the name demangler. Answer (DAIS): We intend to make the code available to partners. There are some parts that use IBM proprietary code, but that code is used for AIX specific functionality. Comment (Jeff H): Many of the features on the list of possible additions to the dyninstAPI will require a substantial amount of work. Perhaps we are better off starting by defining the interfaces. III. Status of DAIS Doug is still working on the first public draft of the DAIS document (should be ready in about 2-3 weeks). A list of possibly useful features was placed on the white board. A * means that Doug felt the issue was already addressed in the current DAIS effort. - security * - process/thread sub-grouping (and names groups) maybe hooks for MPI communicators to register? - scaling to 1000's of nodes - help with sync clocks (external libraries) - Can the RPC mechanism be abstracted so that different ones can be used? - language consistent between DAIS and DyninstAPI - App language expression {language and mechanism - compiled, interp, run-time compile} - moving data from app (DAIS vs dyninstAPI) - communications between daemons (OMIS does this) - communications between clients / peers ... apps & DAIS servers & DAIS clients - multiple simultaneous clients tools - interface for serial tools (yes -- degenerate case) - A dyninst-only tool coexisting with a DAIS-based tool * This may note be possible, can't attach to the same application at the same time. - question about whether a dyninst tool would co-exist with a DAIS-dyninst tool (DP - no). - NT interface (dyninstAPI has one, DAIS doesn't) - dump what you think the state of the world is now (tools for debugging tool building) - language for API (interface & implementation) {how many languages are involved here} - ... discussion of implication of exceptions ... - work in batch / queued mode - connecting to a job with or without stopping { DAIS has both an attach and connect} - dynamic process / thread spawning ... (might just provide a registration hook) - eventually 3rd party data transfers ... e.g. ship a block of data to a third process ... Question: What language is DAIS written in? Answer (Doug P): It uses C++ with no templates, but does use data polymorphism and exceptions. Comment (Jeff H): The dyninstAPI uses a constructor only for the top level object, then uses member functions to build up other objects, thus avoiding having to deal with exceptions. Question: Is it possible to abstract out the authentication so that a different module can be plugged in to provide different authentication? One suggestion might be to use the GSS API for security. Answer (Doug P): I am not familiar with the GSS API, but we might be able to create some layer that lets users select a security interface. Comment (Bart M): Doug, if you could write a thin layer to adapt DCE to conform to GSS (and I don't know what it looks like exactly, so I don't know how hard this would be) could you then support the GSS API and still supply the DCE security you had planned? Answer (Doug P): I don't know, I will have to look at GSS, but it might be possible. Question: How does it work when multiple tools try to use DAIS at once for the same application? This is an important features if DAIS is used for load leveling, RAS, or condor, and then someone wants to do visualization or debugging. Answer (Doug P): There are two types of modes possible: attach: exclusive access to process, can change control flow. connect: access to process, can insert probes (attach without stop, or "asynchronous attach") Question (Jeff H): What issues does dynamic process spawning raise for DAIS? Answer (Doug P): DAIS doesn't deal with this currently. V. Summary and Action Main Goal by teleconference: Make the requirements more concrete and document them. Action Items: Suggested by Doug P: We need to identify tool developers must-have and like-to-have API features. We want to try to avoid a laundry list! Robert Hood volunteered to look through the features used by p2d2 to identify missing items from the dyninstAPI Jeff H will document the interfaces for some of the "easy" extensions and add them to the API document. This will include simple expression string to AST translation, breakpoints, and dynamic loading of code. Doug will give us a new DAIS document in 2-4 weeks. After Doug's document has circulated we will have a tele-conference. We will try to have the next meeting after SPDT'98 in Oregon on Aug. 3. Send Doug email if you want to join in on DAIS end --- so he can get legal things filled out. ------------------------------------------------------------------------------ A special thanks to Mary Zosel and Aaron Sawdey for taking notes during the meeting. Credit for capturing what happened goes to them, blame for inaccuracies should be directed to me - Jeff