/* * Copyright 2008 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ # ident "%Z%%M% %E% %I% SMI" 1. Introduction 1.1 Portfolio Name Libtopo enumeration of fans and power supplies via IPMI 1.2 Portfolio Authors Rob Johnston Eric Schrock 1.3 Submission Date 01/26/2008 1.4 Project Team Aliases: robert.johnston@sun.com eric.schrock@sun.com 1.5 Interest List fma-core@sun.com 1.6 List of Reviewers Reviewer Group Version Date Comments of Reviewed (Approved/Rejected/Others -------- ----- ------- -------- ------------------------- 2. Portfolio description The Solaris FMA framework is designed to diagnose failures in system components. Currently these components are discovered by probing the hardware visible to Solaris via standard OS paths (I/O, CPU, DIMMs, etc). However, there exists a set of components that are crucial to the ongoing health of the system that have no connection visible to Solaris. The most common components, and the most likely to encounter failures, are power supplies and fans. On low-end hardware, these components are often not observable, and it is the responsibility of the user to manually detect component failure, or run custom (Windows) software to observe the system. Higher end systems (such as the x4000 series shipped by Sun) have a service processor that manages the physical components and sensors in the system. Some systems (such as SPARC) have a custom communications mechanism between the OS and the SP, but the industry standard is IPMI (Intelligent Platform Management Interface). Solaris already has the ability to communicate with the SP over the baseboard management controller (/dev/bmc), and a basic library (libipmi) already exists. Integrating support for power supplies and fans within FMA is an important step in bringing all hardware topology enumeration and diagnosis under a single infrastructure. Without this ability, users must manage a separate OS instance (on the SP) with different configuration, separate management, and separate notification mechanisms. This proposal adds basic enumeration support for power supplies and fans on platforms supporting IPMI. It does not include the ability to diagnose psu or fan failures, nor does it provide a way to read environmental sensors (fan speed, etc) for these components. This functionality will be provided by a future project. 3. Fault Boundary Analysis (FBA) On x86 systems, the root of the hc topology tree is hc:///motherboard=0 (though bay nodes can exist at the root level as well). It doesn't make sense to have physical components like fans underneath the motherboard, nor does it make sense to have them directly at the root level. Future projects will add sensors that monitor the chassis itself, and the components are contained within the chassis, so a new root hc node is created: hc:///chassis=0 The current target systems (the x4000 series from Sun) only have a single chassis, but the topology allows for multiple chassis. Such systems typically run multiple OS instances, and coordinating diagnosis across disparate domains is likely better left to service processors. Standalone systems with a single OS instance stand to gain the most from a unified host diagnosis strategy. Future work may explore how IPMI can be leveraged for more complicated systems. Within IPMI, fans and psus can be grouped together into domains that represent a logical unit (typically a FRU). While uncommon for power supplies, this is quite common for fan modules or fan trays that contain multiple fans. Therefore a multi-level topology will be created of the form: hc:///chassis=0/psu=0 hc:///chassis=0/psu=1 hc:///chassis=0/powermodule=0 hc:///chassis=0/powermodule=0/psu=0 hc:///chassis=0/powermodule=0/psu=1 hc:///chassis=0/fan=0 hc:///chassis=0/fan=1 hc:///chassis=0/fanmodule=0 hc:///chassis=0/fanmodule=0/fan=0 hc:///chassis=0/fanmodule=0/fan=1 The IPMI components are technically 'cooling' elements, not fans. For the systems which currently support Solaris and IPMI, only fans are supported. In the future, we may be able to detect non-fan cooling elements by examining the set of associated sensors (such as a tachometer) and inferring the type of cooling element. With IPMI, we know all components, even if a component is not currently present. To allow management software to detect empty component slots, the FMRIs will always be enumerated, but the is_present method will return false if the component is not currently present. As part of this work, the existing 'bay' nodes on the X4500 and X4540 will be moved under this new chassis node. These FMRIs are used by the disk-* fmd modules to diagnose predictive failure for disks. The diagnosis is simply a 1:1 mapping from ereport to fault, so no open cases will be impacted by this change. Existing faults will no longer show up in 'fmadm faulty' after upgrade because the resource will have changed, but the disk-transport module will immediately re-diagnose the same failure using the new FMRIs, assuming the fault still exists. Below is the actual topology (from chassis down) from a Galaxy 2 server: hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/psu=0 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/psu=0 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/psu=0 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/psu=1 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/psu=1 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/psu=1 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0 label string FT 0 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=0 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=0 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=0 label string FT0 FM0 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=1 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=1 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=1 label string FT0 FM1 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=2 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=2 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=0/fan=2 label string FT0 FM2 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1 label string FT 1 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=0 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=0 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=0 label string FT1 FM0 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=1 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=1 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=1 label string FT1 FM1 hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=2 group: protocol version: 1 stability: Private/Private resource fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=2 FRU fmri hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=vcr/chassis=0/fanmodule=1/fan=2 label string FT1 FM2 DYNAMIC ENUMERATION ------------------- A new common libtopo module, ipmi, will be provided that will do dynamic enumeration of IPMI components. While currently only supported on x86 systems, any system supporting IPMI should work, so the module will be present on all architectures. If future SPARC platforms support IPMI over /dev/bmc, then everything should "just work". IPMI has the unusual property that the world is defined solely by 'sensor descriptor records' (which may be sensors, FRUs, etc). Instead of iterating over entities (the IPMI term for components), one instead iterates over all SDR records and infers an entity's existence based on the sensor records that refer to it. The logic to handle this will be kept within libipmi, and the ipmi enumerator will iterate over all discovered entities for any 'power domain', 'power supply', 'cooling domain', or 'cooling unit' entities. Using IPMI entity association records, libipmi will have already organized these into the appropriate hierarchy. The default label for each entity will be based on the hierarchy of components, using a simple algorithm of the form "(FM )*FAN ". So a system with no fan modules would simply have "FAN 1", while a system with fan modules would report "FM 0 FAN 1," etc. These labels may or may not correspond to the labels on the chassis, but under a correct IPMI implementation they will be roughly correct, and there will be a means to override them on a per-platform basis (see below). For components with a FRU locator record, it may be possible to assign a label matching the FRU name, such as 'ft0.fm1.fru', though it's unclear if this is any better (the naming is entirely up to the SP, and the '.fru' extension is just a convention currently used by the current SP firmware). Each component that is directly under the chassis will be assigned a FRU FMRI matching its resource. Components within an association will default to the FRU FMRI of their parent, unless they have associated FRU locator records, in which case they will have a distinct FRU FMRI matching their resource. The sensors associated with the entity will be used to determine presence as described in the IPMI specification. STATIC ENUMERATION ------------------ It would be nice if dynamic enumeration were enough to model any system supporting IPMI. Unfortunately, as is the case with most platform technologies (such as SMBIOS), complete support for enumeration is hampered by limitations of the specification as well as the implementation. With a proper implementation of the IPMI spec, it is possible to enumerate all the components, though attaching semantic meaning to them (labels, failure sensors, etc) is only possible in some cases. On top of this, most platforms have an IPMI implementation that leaves something to be desired. A common problem is the lack of entity association records, so fans that should be part of a logical module (even if correctly represented via SDR records) are not associated with one another. Other problems include presence sensors that reference incorrect entities, missing or incorrect FRU locator records, etc. To compensate for both of these problems, libtopo will support both dynamic enumeration, static enumeration, and static assignment of senors and properties to dynamically discovered entities. LIBIPMI DETAILS --------------- As part of this work, libipmi will be expanded in several different capacities, mostly related to parsing SDR records and representing entities. The SDR infrastructure will be expanded to support all possible SDR record types (compact sensors, full sensor, entity association, etc). The code will also be simplified to separate out the SDR name (when available) from the record, since constructing this value is non-trivial and should not be left to the consumer. New interfaces for gathering sensor readings based on a compact or full SDR record will be introduced. This consists mainly of a large number of #defines, code to transform readings based on the linearization function, and parsing the sensor units. Some of this infrastracture will not be fully used until future sensor work is complete, but enough of it is needed at this point (namely parsing sensor-specific state masks) to warrant its inclusion as part of this project. Based on this new infrastructure, libipmi will be enhanced to have a native notion of entities, even these do not exist as such in the IPMI specification. The library will scan the SDR records, detect referenced entities, group sensors with associated entities, and parse entity association records to create a hierarchy of entities. This will also include a function to detect entity presence. This isolates the details of IPMI entities (of which there are many) to within libipmi, simplifying the topo enumerator and allowing other software to be developed on top of it. One of these pieces of software will be a private utility under /usr/lib/fm, 'ipmitopo', which will display all IPMI entities (id, type, presence) and sensors associated with each entity (reading, state, type, etc). This tool is not designed to replace the open source 'ipmitool' and exists solely to debug the IPMI topo implementation by leveraging the same code used by libtopo. LIBTOPO ENHANCEMENTS -------------------- To make the implementation of this project possible, a extensions and modifications of the libtopo XML schema are necessary. Currently it is not possible to register module methods on nodes that are statically enumerated via XML map files. Typically, node methods are registered onto a node by the enumerator module after the node is bound to the topology. However, since statically enumerated modules aren't created by the enumerator module this registration doesn't occur. While there will be cases where we will be forced to statically define psu and fan topologies via XML, these nodes still need to support the node methods that are implemented by the ipmi enumerator module. In order to allow these methods to be registered on statically defined nodes, the XML syntax will be extended to all the element to be placed inside a element. This will allow the a module's enum entry point to be invoked in order to perform any required post-processing on statically defined nodes. In the case of the ipmi enumerator module, we will use this new capability to register its methods on statically defined nodes. Below is an example usage: . . . Additionally, we will be extending the topo DTD to allow for conditional processing of elements based on the platform name. There currently exists the element, which allows for conditional processing of property groups. Rather than adding a brand new element for ranges (ala ) we will eliminate the element and create a new, more generic element, , which can be used to specify conditional processing of either ranges or property groups. We will modify the chip-hc-topology.xml (the only current consumer of the element) to use the new element. Below are some example usages of the element: All of the above extensions will be backwards compatible with any existing map files and enumerator modules. FRU Labels and Identity ----------------------- As we've done with CPU's and DIMM's on x86, the intent is to base the labels for psus or fan/fanmodule nodes on whatever the service person would expect to deal with. For most cases this will be the actual silkscreened label, or the label from the service sticker from the inside of the removable chassis panel. In the absence of both of those we will use the whatever label is used in the schematics from the Sun System Handbook. Serial number information is generally not available for fans. However, on some platforms (i.e. dorado, tucana), the FRU locator records for the power supplies contain serial numbers. For cases where serial number information is available, the ipmi enumerator module will fetch it and include it as part of the authority during enumeration. 4. Diagnosis Strategy N/A 5. Error Handling Strategy N/A 6. Recovery/Reaction N/A 7. FRUID Implementation The FMA core team will be investigating how to implement generic support for updating FRUID records in response to resource state changes that are initiated by the fault manager running in the domain. The intent is be to leverage this support (when it becomes available) in a future project that will include error handling and diagnosis for fans and power supplies. 8. Test End-to-End Testing ------------------ As a basic end-to-end sanity test, we will run the sanity test set from the Fault Harness on a variety of sparc and x86 systems. Basic Regression Testing ------------------------ The FMA Functional Test Suite will be run on sparc and x86 platforms to check for regressions Additionally, since we are making several changes to libtopo, care must be taken to ensure that we don't introduce any unintentional changes to the topology. To verify this, we will compare topology snapshots from before and after our changes as follows: step 1: Install base Nevada build step 2: Capture output of "fmtopo -V" step 3: BFU system to our project bits step 4: Capture output of "fmtopo -V" step 5: Manually inspect the diff results between the pre and post snapshots and verify no unexpected changes occurred. On x86 Sun platforms with service processors we expect to see the new root-level chassis node and the appropriate set of fan/psu nodes. On sparc systems we expect there to be no changes. Stress Testing and Memory Leaks ------------------------------- We will run the "fmstress" test tool on both sparc and x86 both as a robustness test and to verify that our changes have not introduced any memory leaks in the FMA userland components. 9. Gaps The initial consumer of this new topology data will be various components of the software stack for the forthcoming Fishworks NAS products. Additionally, this proposal lays the groundwork for a variety of future work under the auspices of the FMA Sensor Framework. The next step will be to include fan and PSU diagnosis. This requires representing failure sensors within libtopo using the facility nodes proposed as part of the sensor framework. These sensors are then read by a sensor-transport module that has as 1:1 correspondence between ereports and faults. This will serve as a proof of concept for facility nodes and prepare the way for the larger sensor and alert framework, while providing the greatest immediate benefit. Future work will include representing analog sensors in libtopo, developing an environmental monitor, detecting fan and PSU hotplug, and creating a persistent alert framework. 10. Dependencies 11. References "IPMI v2.0 rev. 1.0 specification markup for IPMI v2.0/v1.5 errata revision 3" http://www.intel.com/design/servers/ipmi/pdf/IPMIv2_0_rev1_0_E3_markup.pdf Sensor Abstraction Layer OpenSolaris Project http://www.opensolaris.org/os/project/sensors/ Libtopo documentation: FMD Programmer's Reference, Chapter 9 http://www.opensolaris.org/os/community/fm/FMDPRM.pdf