RETIRE AGENT FOR I/O DEVICES I BACKGROUND ============ There are three major steps in the Solaris Fault Management process: a) Detection and handling of an error ---------------------------------- The trigger for the I/O fault management process is the detection of a device error in the kernel. The error may be detected by a device different from the faulty device - for example a CPU, or it may be detected by a driver associated with the faulty device. In the former case, the I/O fault handling framework works with drivers to localize the error to a specific device. In the latter, the error has already been localized. In both cases the affected driver is responsible for handling the error and dispatching an ereport to userland. b) Diagnosing the problem ---------------------- Diagnosis Engines (DEs) in userland consume the ereports and pinpoint the faulty device and the type of fault. Once a device is diagnosed as faulty, a fault event that contains the identity of the faulty device is generated. c) Retiring a faulty device ------------------------ A retire agent is responsible for reacting to a fault event and offlining or disabling the faulty device. A retire agent for CPU and memory is already in place. This document describes the design of a generic Solaris retire agent for I/O devices. II Hot device removal in Solaris ================================ Currently, hot device removal (i.e. removal of a device while Solaris is running), falls into one of 2 categories a) Coordinated device removal ----------------------------- In coordinated device removal, removal of a device is done in cooperation with the Solaris operating system. This process involves checking for existing users of the device, getting them to stop using the device (via RCM), quiescing the device, removing Solaris data structures for that device and rendering the device safe for removal. The device is then physically removed. In the event that one or more users are still using the device, the device removal is aborted. Board DR and SCSI hotplug are two examples of this approach. b) Surprise device removal -------------------------- In surprise device removal, the device removal is not coordinated with Solaris and happens asynchronously. Typically a physical device removal generates an interrupt and the interrupt handler initiates cleanup of OS data structures. Any existing consumers of the device fail with EIO. A good example of this approach is USB. III The Problem =============== Irrespective of the mechanism used to achieve I/O retire, there are a number of problems that any approach to I/O retire must solve: 1. Retire policy ---------------- Any decision to retire a device must be in concord with policy (if any) set by the system administrator. An administrator or Sun service personnel may on occasion need to disable device retire temporarily. Retire code should check policy settings before deciding to retire a device. 2. Retire safety ------------------ In some cases, it may be safer not to retire a device. For example, a device that contains the root filesystem should not be retired. Similarly, certain platform specific drivers (for example environment monitoring drivers) should not be retired as they are required for proper functioning of the system. In such cases it would be best to keep the device running in a degraded state. 3. Retiring a device in use ---------------------------- A device that is open cannot be detached (i.e. unconfigured). One possible solution to this problem is to use RCM. DR uses RCM to inform consumers that a device is going away. Consumers then stop using the device allowing DR to proceed. Unfortunately, RCM doesn't handle all consumers (only those with RCM modules) and therefore is unsuitable as a general mechanism for I/O retire. RCM is also a userland framework and does not deal with kernel consumers. 4. Retire semantics ------------------- There are several issues that need resolution regarding the behavior of a retired device. The first issue is behavior with respect to existing consumers. Secondly, we need to determine how such a device appears to new consumers. Finally, we need to settle upon an acceptable level of observability for the device. Clearly we cannot have the device completely disappear from Solaris until it is physically replaced or removed. 5. Retire persistence --------------------- A device once retired may not be replaced/removed immediately. To avoid having to reretire the device all over again after the next boot, we need to persist the retire across reboots. Any mechanism employed to persist retires should be usable early in boot so that retired devices are never configured into the system in the first place. This is important, as a configured device may be opened by a consumer making it harder to retire the device (again). 7. Unretire ----------- Once a device is replaced or fixed, we need some way to bring it back into service (unretire). This unretire could be automatic or manual. Manual unretire requires operator intervention after a device has been replaced. A key requirement here is that after unretire (whether manual or automatic), a system should not require a reboot to bring the replaced device back into service. IV Proposed solution ==================== 1. Retire policy ----------------- Since the FMA framework and Solaris software have better knowledge of the fault and it's implications, we propose that I/O retire be minimally configurable. There are three primary reasons for configurability: a. To enable Sun service personnel to disable retire so that problems can be diagnosed b. To allow sysadmins to disable retire for site-specific reasons. c. To disable I/O retire for certain rare fault types where the diagnosis is known to be incorrect. We propose two properties that can be set in the retire agent's .conf file. The first ("global-disable") can be used by Sun service personnel to temporarily block I/O retire so that they can diagnose problems. The default value of this property will be false i.e. I/O retire is enabled by default. The second property ("fault-exceptions") is a colon separated list of fault types for which I/O retire should be disabled. This is expected to be used for the (rare) fault types which are known to be diagnosed incorrectly. This property will be private and will not be documented. 2. Event subscription --------------------- The I/O retire agent will subscribe to all "fault.io.*" events and uses the ASRUs in such events to pinpoint the device to be retired. It expects the ASRUs to be in the "dev" scheme. After a lot of discussion by the FMA portfolio review committee, it was decided that due to the coarse nature of current FMA diagnoses, automatic retire will only be undertaken if a single device is pinpointed as faulty i.e. if the "list.suspect" event (consumed by the I/O retire agent) either has a single member or multiple members with the same ASRU. In addition, the agent also subscribes to the "list.repaired" event to detect device repair or replacement. The unretire process is triggered by the receipt of a list.repaired event. 3. Retire constraints ---------------------- Certain devices cannot be safely retired without compromising the stability of the system. This project will provide a mechanism which will constrain retire to only those devices deemed non-critical to system operation. Two types of entities may impose constraints on retire: userland and kernel entities. i) Userland constraints ======================== This project will enhance the contracts framework (PSARC/2003/193) to create a new contract type (device contracts). A device contract is an agreement or a contract between a process and the kernel regarding the state of the device. A device contract may be created when a relationship is formed between a device and a process i.e. at open(2) time, or it may be created at some point after the device has been opened. A device contract once formed may be broken by either party. A device contract can be broken by the process by an explicit abandon of the contract or by an implicit abandon when the process exits. A device contract can be broken by the kernel either asynchronously (without negotiation) or synchronously (with negotiation). Exactly which happens depends on the device state transition. The following state diagram shows the transitions between device states. Note that the transitions are "unconfiguration" transitions. Configuration transitions are intentionally left out as they are not relevant for I/O retire. Future projects which need configuration functionality can easily add them to the device contract framework. <-- A --> /-----------------> DEGRADED | | | | | | S | | | | | v v S --> v ONLINE ------------> OFFLINE In the figure above, the arrows indicate the direction of transition. The letter S refers to transitions which are inherently synchronous i.e. require negotiation and the letter A indicates transitions which are asynchronous i.e. are done without contract negotiations. A good example of a synchronous transition is the ONLINE -> OFFLINE transition. This transition cannot happen as long as there are consumers which have the device open. Thus some form of negotiation needs to happen between the consumers and the kernel to ensure that consumers either close devices or disallow the move to OFFLINE. Certain other transitions such as ONLINE --> DEGRADED for example, are inherently asynchronous i.e. non-negotiable. A device that suffers a fault that degrades its capabilities will become degraded irrespective of what consumers it has, so a negotiation in this case is pointless. The following device states are currently defined for device contracts: CT_DEV_ST_ONLINE The device is online and functioning normally CT_DEV_ST_DEGRADED The device is online but is functioning in a degraded capacity CT_DEV_ST_OFFLINE The device is offline and is no longer configured Refer to PSARC/2003/193 for background information on contracts. A typical consumer of device contracts starts out with a contract template and adds terms to that template. These include the "acceptable set" (A-set) term, which is a bitset of device states which are guaranteed by the contract. If the device moves out of a state in the A-set, the contract is broken. The breaking of the contract can be asynchronous in which case a critical contract event is sent to the contract holder but no negotiations take place. If the breaking of the contract is synchronous, negotations are opened between the affected consumer and the kernel. The kernel does this by sending a critical event to the consumer with the CTE_NEG flag set indicating that this is a negotiation event. The consumer can accept this change by sending a ACK message to the kernel. Alternatively, if it has the necessary privileges, it can send a NACK message to the kernel which will block the device state change. To NACK a negotiable event, a process must have the {PRIV_SYS_DEVICES} privilege asserted in its effective set. Other terms include the "minor path" term, specified explicitly if the contract is not being created at open(2) time or specified implicitly if the contract is being created at open time via an activated template. A contract event is sent on any state change to which the contract owner has subscribed via the informative or critical event sets. Only critical events are guaranteed to be delivered. Since all device state changes are controlled by the kernel and cannot be arbitrarily generated by a non-privileged user, the {PRIV_CONTRACT_EVENT} privilege does not need to be asserted in a process's effective set to designate an event as critical. To ensure privacy, a process must either have the same effective userid as the contract holder or have the {PRIV_CONTRACT_OBSERVER} privilege asserted in its effective set in order to observe device contract events off the device contract type specific endpoint. Yet another term available with device contracts is the "non-negotiable" term. This term is used to pre-specify a NACK to any contract negotiation. This term is ignored for asynchronous state changes. For example, a provcess may have the A-set {ONLINE|DEGRADED} and make the contract non-negotiable. In this case, the device contract framework assumes a NACK for any transition to OFFLINE and blocks the offline. If the A-set is {ONLINE} and the non-negotiable term is set, transitions to OFFLINE are NACKed but transitions to DEGRADE succeed. The OFFLINE negotiation (if OFFLINE state is not in the A-set for a contract) happens just before the I/O framework attempts to offline a device (i.e. detach a device and set the offline flag so that it cannot be reattached). This need not necessarily be the result of retire activity. A device contract holder is expected to either NACK the offline (if privileged) or release the device and allow the offline to proceed. The DEGRADE contract event (if DEGRADE is not in the A-set for a contract) is generated just before the I/O framework transitions the device state to "degraded" (i.e. DEVI_DEVICE_DEGRADED in I/O framework terminology). As far as I/O retire is concerned, a device may become degraded at three points during the fault management process: a) FMA I/O error handling by the driver may result in the device state being set to the DEGRADED state b) An inability to immediately offline a device due to userland or kernel consumer. The I/O retire code will in such cases move the device to the DEGRADED state until it can be offlined. c) An inability to retire the device (because the device provides a critical service) will result in the device moving to the DEGRADED state. The contract holder is expected to ACK or NACK a negotiation event within a certain period of time. If the ACK/NACK is not received within the timeout period, the device contract framework will behave as if the contract does not exist and will proceed with the event. In the I/O retire case, I/O code will be aware that constraints have not been applied and will behave accordingly. The contracts framework provides an elegant mechanism that solves two problems: a). A process can use it to reconfigure itself in the face of an impending device state change (and in the process release the device to allow the state change) b) It can also use it to impose constraints on the state change so that state changes that may cause problems are disallowed. Note that blocking a state change requires that the process be privileged. It is expected that the device contract framework will be generally useful for consumers other than I/O retire. We expect future projects to include more device states for device contracts. ii) Kernel constraints ====================== Certain resources may be in use solely by kernel consumers. Such resources will not have a corresponding userland consumer and device contracts cannot impose constraints on their retire. To allow such consumers to have a say in the retire of I/O devices we will enhance the event interfaces provided by the LDI framework (PSARC/2001/769). The new LDI event interfaces will provide two services to I/O retire: 1. It allows the imposition of kernel constraints by kernel consumers of devices 2. It allows layered drivers to generate device contract events on minors they export for events affecting minors they import. For example if SVM were converted to use the LDI, events affecting disks opened by SVM could be propagated to contracts on device minors exported by SVM The new LDI event interfaces are expected to be generally useful i.e. they are not meant solely for I/O retire. The current LDI events interfaces have no consumers and this enhancement will not affect anyone. For any consumers that want the old style LDI events, equivalent functionality is available via the new LDI event interfaces. Two primary interfaces are being defined by the project: a) A notification callback service which informs layered consumers of an impending state change giving them an opportunity to either reconfigure themselves or to block the state change. The reconfiguration will allow resources to be released allowing the state change to proceed. The notification and release will be synchronous i.e. the release will be carried out by the callback. This interface will also hook up with the device contract framework to provide state change events to userland contract holders of minors exported by the layered driver. b) A post event "finalize" callback service that indicates whether the state change succeeded. This allows consumers to finalize their reconfiguration. This offers functionality that is equivalent to the old LDI event interfaces. The interface will also generate "negotiation end" events for all applicable contracts. This project defines two LDI events: LDI_EV_OFFLINE and LDI_EV_DEGRADE. They have the same semantics as the corresponding device contract events. The notification interface is used to register a "notify" callback with the LDI event framework. The notify callback serves two main purposes i) It allows layered drivers to impose constraints on the retire of devices ii) It allows layered consumers to reconfigure themselves and release devices, allowing the offline of such devices to succeed. The finalize callback serves as an indicator of final disposition of a device state change i.e. it indicates whether the device state change succeeded or not. The finalize callbacks are always called at the end of all defined LDI events. The notify callback is only called if the specified event can be potentially blocked or vetoed by a consumer. So for the events defined by this proposal, the offline event results in both a "notify" and a "finalize" callback, but a degrade event (which cannot be blocked) will only result in finalize callbacks being invoked. In addition, it is guaranteed that if a layered driver receives a notify event, it will receive a finalize event, unless the layered driver itself rejected the state change. See the man pages at the end of this document for details. For examples of how a typical kernel consumer (ZFS) could use these interfaces, see the example section at the end of this document. iii) Legacy constraints ======================= Certain kernel consumers like UFS and SVM have not been converted to use the LDI (and likely will never be). To allow such consumers to apply constraints on the retire process, we will use the RCM framework to impose constraints. Note that this approach will be used only for legacy consumers since the contracts framework is a cleaner and more correct approach to managing device state changes. The contracts framework is cleaner because it allows state changes to be managed in a simpler fashion from within the application without requiring an additional external piece of software i.e. an RCM module. It is a more appropriate approach because RCM can only handle synchronous events like userland initiated DR. RCM cannot be used for asynchronous events such as an in-kernel device state change. Device contracts can be used for both types of events. Also since RCM is a userland framework, it cannot usually handle kernel consumers of devices. 4. Retiring a device in use ---------------------------- There are several different types of scenarios where I/O retire may be invoked. In the following, a device is said to be in use if there are are existing opens of the device either by a userland or kernel consumer. 1. Device not in use: In this case, I/O retire is very simple. We can detach the device and set an "offline" flag which will prevent the device from being attached by subsequent configuration operations such as open(2). 2. Device is in use: In this case, things are a little more complicated. First we have to check if there are any retire constraints for this device. If these constraints exist, the device cannot be retired. If not, the device is retirable Retire can take two forms: fencing or offlining. Offlining is used when the device is no longer in use and when the device node can be safely offlined. A device that was originally in use may no longer be in use because as part of the constraint checking process, consumers may release devices. Fencing is used when a retirable device is in use, since a device cannot be offlined when there is an existing open of the device. Fencing essentially consists of using the specfs filesystem to isolate the device. All new opens of the device are failed with ENXIO. This will require changes at the specfs layer for userland consumers and in the LDI for kernel consumers. I/O operations and unconfiguration operations (such as close) will continue to function normally so that existing consumers can release the device. If a "real" detach i.e. offline isn't possible immediately, we will schedule a deferred detach at periodic intervals. It is expected that as I/Os fail due to FMA error handling (PSARC/2002/288) in the driver, the consumer will release the device allowing detach and offline to take place. Another (optional) mechanism to speed up the transition from a fenced state to a detached state is to have drivers of devices register an interest in retire events targeting the device they drive by registering a callback via NDI event services. They can then be notified that the device they control is the target of a retire operation. Such drivers can speed up the process of retire by returning appropriate errors (such as EIO) for I/O to the affected device, forcing consumers of such devices to stop using the device. Eventually, the device will have no consumers, allowing the framework to detach and offline the device. 3. Interaction with MPXIO: There are two possible errors for a device under MPXIO control: a) Path fault: The fault may be in a path component along a path to the device. In this case the path component will be retired. The MPXIO framework will detect that the path is not available and will switch over to another path. b) Disk fault: In this case since the disk itself is bad, switching the path will not help. I/O retire will retire the virtual disk node under the VHCI making it unavailable along any path. Any attempt to open the device will return ENXIO. 4. Interaction with RCM: There are certain kernel consumers that cannot impose constraints via contracts (a userland mechanism) or via LDI (since they have not been converted to use the LDI). These include UFS and SVM. To allow such entities to impose constraints we will use RCM. The changes required are minimal - we will use the request_offline, notify_online and notify_remove entry points of RCM modules. The three entry points will be invoked with the RCM_RETIRE flag set to indicate that the operation is in the context of a retire operation to allow for the slightly different behavior required relative to retire operations. No attempt will be made to enhance RCM to inform RCM clients of asynchronous events such as the "degraded" event. The goal here is only to check for retire constraints, not improve RCM. 5. Other consumers: There may be consumers of devices which do not use either device contracts (userland consumers) or the LDI (kernel) In this case, if the consumer is the only consumer of the device, automatic retire will be blocked. If there is another consumer of the device and that consumer uses device contracts or the LDI to allow a retire to occur, an automatic retire will be initiated. The device will be fenced off from new consumers but existing consumers can continue to use the device until the next reboot at which point the device will offlined before it's first attach. 5. Behavior and Observability of a fenced off/retired device ------------------------------------------------------------- For a fenced off device, all configuration operations will fail with ENXIO. Unconfiguration operations like close, etc., will however succeed so that the device can be closed. A device that has been offlined (i.e. is not merely fenced off) will be detached from it's driver and cannot be attached. All operations on the device will fail with ENXIO A offlined or fenced off device is still DRable and can be DRed out via standard DR tools like cfgadm. A retired device replaced via DR will stay retired, until the device is unretired. For observability, the device will still be present in the kernel device tree (albeit in the retired state) and so will be visible through tools like prtconf and (k)mdb. The output of prtconf will indicate that the device has been retired (see Example VIII.3). Similarly cfgadm output for a retired device will indicate the Condition "failed" indicating that the device is no longer usable. 6. Persistence of retire: ------------------------- The retired status of a device will be persisted in a file - /etc/devices/retire_store. This file will be read early in boot, (on x86 systems, this file will be included in the boot archive) and a list of devices that have been retired will be created. If the device is not a self identifying node, then it will have it's devinfo node marked with the DEVI_DEVICE_OFFLINE flag which will prevent it from attaching. If the devinfo node is a self identifying node, it will be fenced off rather than offlined, since self identifying nexii typically remove devinfo nodes that fail to attach. The end result is the same for both self identifying and non-self identifying devices - the device will be unavailable to all consumers. Any attempts to open such devices will fail with ENXIO. A global integer variable ddi_retire_store_bypass will be made available to disable this feature - this can be set via (k)mdb or /etc/system. An alternative mechanism to bypass this persistent store is to boot the system with the "ask" flag i.e. "boot -a" and specify /dev/null as the retire store. This is useful for recovery if a critical device makes it into the persistent store due to a software error. 7. Unretire Devices ------------------- The unretire of a device may be manual or automatic - in either case it is the FMA framework that indicates to the retire agent that a device has been repaired. In the former case, the FMA framework is notified by a user via the command: "fmadm repair". In the latter case the FMA framework detects via some form of serial ID/GUID that a device has been replaced. In both cases, the FMA framework generates a "list.repaired" event indicating that the device has been repaired. The retire agent uses this event to initiate the unretire process. Once unretire is complete, no reboot will be required to configure and use the unretired device. Note that if a retired device is removed (while the system is down or via DR) and replaced by a different device or by the same device, the replacement will stay retired until the unretire process is triggered (manually or automatically). 8. Driver modifications ------------------------ This I/O retire proposal does not directly require any modifications to a device driver to allow the device to be retired. However, the I/O retire agent is dependant on a correct diagnosis by a diagnosis engine which in turn relies on error telemetry from the kernel. While some device faults can be diagnosed without modifying the device driver, certain faults can only be diagnosed with proper error information from a hardened i.e. modified driver. Hardening requires a driver to conform to the FMA I/O fault services spec as outlined in the Writing Device Driver (WDD) guide. See the I/O fault services chapter in the WDD for more details. V Non goals ============= 1. Retire of devices other than disk and nexus devices -------------------------------------------------------- The current phase of I/O retire will only cover the retire of nexus and disk devices. Retire of other devices (such as NICs) require domain specific constraints imposed via the interfaces provided by this project and are beyond scope of this project. It is expected that domain experts will develop the necessary constraints and use the interfaces supplied by this project to impose them. Until then, such devices will not be automatically retirable. However, if a device is currently not in use, it will be automatically retired even if it is not a disk or a nexus. 2. Fencing limitations for certain devices with kernel consumers ----------------------------------------------------------------- Some device accesses in the kernel cannot be completely fenced off. For example, devices that are directly accessed in the kernel via the bdev_* and cdev_* interfaces instead of the LDI interfaces (PSARC/2001/769) cannot be completely fenced off. specfs does not play a role in such accesses and cannot intercept accesses. It is expected that such consumers will be eventually migrated to LDI which will fully support fencing. 3. Converting layered drivers to use the LDI --------------------------------------------- Certain kernel consumers have not been converted by the layered driver project to use the LDI interfaces. This project will not attempt to convert any drivers to use the LDI framework since that is beyond the scope of this project. 4. Enhancing RCM ---------------- RCM will be used solely to impose constraints on the retire process i.e. for the "offline" device reconfiguration event. Since RCM by it's userland architecture is limited to use with synchronous userland events such as userland initiated DR, it is not easy to use it for asynchronous kernel generated events such as the "degraded" device reconfiguration event. This project will not make this attempt as the goal here is to support retire not enhance the RAS value of RCM clients. 5. Ability to retire software (drivers) --------------------------------------- This design only covers retire of hardware. Retire of software i.e. drivers is a non-goal for this phase. VI Sequence of events in I/O retire ================================== The following describes the steps that are taken (in order) during FMA I/O retire. To simplify things we consider several scenarios i) Device retire fails because of a retire constraint ------------------------------------------------------ 1. Derive retire target from the fault event 2. Retire policy: Check the value of the "global-disable" policy If not false, abort retire. 3. Use device contracts, LDI "notify" callbacks and RCM calls to check if the device retire is permitted. 4. One or more constraints reject the retire. 5. Send negotiation end (NEGEND) events indicating failure to all applicable device contracts and invoke finalize callbacks for LDI consumers indicating failure. 6. Move the device to the degraded state. As a result device contract events and LDI finalize callbacks for the "degraded" state change are generated. ii) Device retire is permitted but device cannot be offlined ------------------------------------------------------------- 1. Derive retire target from the fault event 2. Retire policy: Check the value of the "global-disable" policy If not false, abort retire. 3. Use device contracts, LDI "notify" callbacks and RCM calls to check if the device retire is permitted. 4. Constraint checking allows the retire to proceed. 5. Persist the retire. 6. A message is logged indicating that the device has been successfully retired. A reference is made to the utility (fmadm(1M)) that may be used to unretire the device. 7. Check the status of offline in step 3. It failed. 8. Fence off the device. 9. Schedule periodic jobs to attempt a deferred offline of the device 10. Send negotiation end (NEGEND) events indicating failure to all applicable device contracts and invoke finalize callbacks for LDI consumers indicating failure. Because the device is fenced off in this specific case, any attempts to reopen the device will fail. 11. Move the device to the degraded state. As a result device contract events and LDI finalize callbacks for the "degraded" state change are generated. 12. At some later point of time, the device is successfully offlined. The offline process includes device contract notifications and LDI notify callbacks. Once the device is successfully offlined, remove the device from the degraded state and generate "success" NEGEND message for device contracts and LDI_EV_SUCCESS finalize callbacks for LDI consumers. iii) Device retire is permitted and device can be offlined ------------------------------------------------------------- 1. Derive retire target from the fault event 2. Retire policy: Check the value of the "global-disable" policy If not false, abort retire. 3. Use device contracts, LDI "notify" callbacks and RCM calls to check if the device retire is permitted. 4. Constraint checking allows the retire to proceed. 5. Persist the retire. 6. A message is logged indicating that the device has been successfully retired. A reference is made to the utility (fmadm(1M)) that may be used to unretire the device. 7. Check the status of the offline in step 3. It succeeded. 8. Send negotiation end (NEGEND) messages indicating success to all applicable device contracts and invoke finalize callbacks for LDI consumers indicating success. iv) Behavior on reboot ------------------------ 1. I/O framework reads the file /etc/devices/retire_store 2. I/O framework creates an in-core cache of device that have been retired. 3. The system emits a informational message to the console indicating that one or more retired devices exist on the system. 4. On the first attempt to attach every devinfo node, the framework checks if it exists in the in-core cache and if it is, either blocks the attach and offlines the device (for PROM based device node) or fences off the device (for "non-PROM" devices). In both cases, the effect is the same - the retired device is unavailable to consumers. v) Unretire sequence ---------------------- The unretire process is initiated by the retire agent when it is informed by the FMA framework via a "list.repaired" event that the device has been repaired/replaced. When a device is unretired, we go through the following steps: 1. Remove the device from the persistent retire store 2. Unschedule the deferred detach i.e. remove any scheduled job that is attempting to offline the device. 3. Tear down fences: If the device was fenced off (via specfs) tear down those fences. 4. Online the device: If the device is in an offlined state, then online it. The device is now unretired. A subsequent operation such as open() will configure the device. VII Future Work =============== The following functionality is not a part of the current set of deliverables but may be delivered in future projects or RFEs. 1. New events: The events for device contracts and LDI events proposed by this project use well established I/O framework events but are limited to events that are directly used by I/O retire. It is expected that future projects will enhance this set to add other generic events. For example if DR were to start using device contracts and LDI events, we would expect the following additional events to be defined: a) suspend b) resume 2. Retiring software: It is possible that software (like device drivers) may have design defects that can be handled via retire. This is a new approach to driver defect management that needs further investigation. VIII Examples ============= 1. Process uproc and the /dev/widget device ------------------------------------------- Here is sample code that indicates how a userland process "uproc" creates a contract for a device "/dev/widget" and negotiates the breaking of the contract. // Get a template for the device contract tfd = open64(CTFS_ROOT "/device/template", O_RDWR); // Open the device contract pbundle for this process efd = open64(CTFS_ROOT "/device/pbundle", O_RDONLY); // Set informative and critical events for this contract ct_tmpl_set_critical(tfd, CT_DEV_ST_OFFLINE|CT_DEV_ST_DEGRADED); ct_tmpl_set_informative(tfd, 0); ct_dev_tmpl_set_aset(tfd, CT_DEV_ST_ONLINE|CT_DEV_ST_DEGRADED); // Activate this template so that the next open creates a contract ct_tmpl_activate(tfd); /* * Note that this is not the only way to create a device contract. * A contract may also be created post-open by setting a minor * path in the template via the ct_dev_tmpl_set_minor() interface * and then creating a contract via ct_tmpl_create() */ dfd = open("/dev/widget", O_RDWR); // Clear the activate so other opens don't create contracts ct_tmpl_clear(tfd); (void) close(tfd); // Get the contract's ID contract_latest(&ctid); // Get the contract's ctl file ctlfd = contract_open(ctid, "device", "ctl", O_WRONLY); for (;;) { // Block waiting for events ct_event_read(efd, &ev); // Read an event and check if it is ours if (ct_event_get_ctid(ev) != ctid) { ct_event_free(ev); continue; } event = ct_event_get_type(ev); evid = ct_event_get_evid(ev); flags = ct_event_get_flags(ev); if (event & CT_DEV_ST_DEGRADED) { /* * The degrade event is within our A-set. The * contract is intact. Since we subscribed to * this event as a critical event, we need to ACK * it so that the event is freed by the kernel. */ ct_ctl_ack(ctlfd, evid); uproc_reconfig(DEGRADE); } else if ((event & CT_DEV_ST_OFFLINE) && (flags & CTE_NEG)) { // uproc code - check if state change permissible ... if (state_change_allowed) { uproc_reconfig(OFFLINE); (void) close(dfd); // close the device ct_ctl_ack(ctlfd, evid); ct_event_free(ev); break; } else { // Block the state change ct_ctl_nack(ctlfd, evid); } } else { ct_event_free(ev); goto error; } ct_event_free(ev); } // We ACKed the state change to offline for (;;) { ct_event_read(efd, &ev); // Read an event and check if it is ours if (ct_event_get_ctid(ev) != ctid) { ct_event_free(ev); continue; } // Negend is a critical event, so ACK it. event = ct_event_get_type(ev); evid = ct_event_get_evid(ev); if (event == CT_EV_NEGEND) { ct_ctl_ack(ctlfd, evid); ct_event_free(ev); ct_ctl_abandon(ctlfd); // contract is broken } else { ct_event_free(ev); goto error; } } // cleanup (void) close(efd); (void) close(ctlfd); } 2. I/O retire and ZFS --------------------- Here is pseudo code illustrating how I/O retire code will work for a disk device consumed by ZFS. a. ZFS uses the LDI (either ldi_open_by_name() or ldi_open_by_devid()) to open a disk minor node. // ZFS code ldi_open_by_name(path, ...&ldi_handle...) or ldi_open_by_devid(devid, ...&ldi_handle...) b. ZFS obtains an event cookie for an offline LDI event for this minor node // ZFS code ldi_ev_get_cookie(ldi_handle, LDI_EV_OFFLINE, &event_cookie) c. ZFS then registers notify and finalize callbacks for this minor. The notify callback is responsible for checking if the proposed reconfiguration is permissible. For example, if the disk hosts a critical filesystem and ZFS cannot replace it, ZFS is expected to return LDI_EV_FAILURE to indicate this. If ZFS can replace it, it is expected to reconfigure itself and release the device. // ZFS code callb.vers = LDI_EV_CB_VERS; callb.notify = zfs_notify; callb.finalize = zfs_finalize; ldi_ev_register_callbacks(ldi_handle, event_cookie, &callb, arg, &callback_id); // Here is the notify callback for ZFS int zfs_notify(ldi_handle_t ldi_handle, ldi_ev_cookie_t ecookie, void *arg, void *ev_data) { // If uninteresting event, just return success and allow // state change to proceed if (strcmp(ldi_ev_get_type(ecookie), LDI_EV_OFFLINE) != 0) return (LDI_EV_SUCCESS); // Since ZFS exports no minors to external consumers // for general purpose use, there is no need to invoke // ldi_ev_notify() here // ZFS code ... if (disk can be offlined) { zfs_reconfigure(ldi_handle, OFFLINE); zfs_save_path(ldi_handle, arg); ldi_close(ldi_handle); return (LDI_EV_SUCCESS); } else if (disk required by ZFS) { return (LDI_EV_FAILURE); } } Since ZFS does not export any minors for general purpose external use, there are no userland or kernel consumers to notify about an impending change in the minor they are consuming. If ZFS exports minors to userland or kernel, then it would be expected to map the imported minors to exported minors and invoke ldi_ev_notify() on them. If the result of that call is LDI_EV_FAILURE the ZFS notify callback should return LDI_EV_FAILURE. d. The finalize callback is responsible for informing ZFS if the state change i.e. offline actually succeeded. If it succeeded, then ZFS can assume that the device is gone, else it can start using the device again. // Here is the finalize callback for ZFS void zfs_finalize(ldi_handle_t ldi_handle,ldi_ev_cookie_t event_cookie, int ldi_result, void *arg, void *ev_data) { ldi_handle_t new_handle; // ZFS code - NOP if this is not an offline event if (strcmp(ldi_get_event(event_cookie), LDI_EV_OFFLINE) != 0) return; // This is an offline event if (ldi_result != LDI_EV_SUCCESS) { // ZFS code path = zfs_get_saved_path(arg); /* A reopen is not guaranteed to succeed */ if (ldi_open_by_name(path, &new_handle) == 0) zfs_reconfigure(new_handle, ONLINE); } } 3. Sample prtconf output for retired devices -------------------------------------------- System Configuration: Sun Microsystems sun4u Memory size: 2048 Megabytes System Peripherals (Software Nodes): SUNW,Sun-Blade-2500 scsi_vhci, instance #0 ----- snip ------ pci, instance #2 scsi, instance #0 disk (driver not attached) tape (driver not attached) sd, instance #0 sd, instance #1 (retired) <===== scsi, instance #1 disk (driver not attached) tape (driver not attached) ----- snip ------ ide, instance #0 disk (driver not attached) cdrom (driver not attached) sd, instance #30 (retired) <====== ----- snip ------ iscsi, instance #0 pseudo, instance #0 ------------------------------------------------------------------------------- IX Interface Table ================== ================================================================================ |Interface Name | Stability level | Comments ================================================================================ |I/O Framework interfaces | | |======================== | | |e_ddi_retire_persist() | Proj. Pvt. | Persist a device retire |e_ddi_retire_unpersist() | Proj. Pvt. | Unpersist a device retire |e_ddi_retired() | Cons. Pvt. | Check if device is retired |e_ddi_retire_device() | Proj. Pvt. | offline or fence a device |e_ddi_unretire_device() | Proj. Pvt. | online or unfence a device |e_ddi_set_retire_interval() | Proj. Pvt. | set deferred offline interval | | | |New modctls | | |=========== | | |MODRETIRE | Proj. Pvt | Kernel processing for retire |MODUNRETIRE | Proj. Pvt. | Kernel processing (unretire) |MODISRETIRED | Cons. Pvt. | Check if device is retired |MODRETIRERETRY | Proj. Pvt. | Set offline retry interval | | | |devinfo structure additions | | |=========================== | | |devi_ct | Proj. Pvt. | List of contracts on device | | | |devinfo structure flags | | |======================= | | |DEVI_RETIRED | Cons. Pvt. | Device has been retired |DEVI_CONSTRAINT | Proj. Pvt. | constraints applied | | | |libdevinfo interfaces | | |===================== | | |di_retire_device() | Proj. Pvt. | Retire a device |di_unretire_device() | Proj. Pvt. | Unretire a device |di_retire_t | Proj. Pvt. | Libdevinfo retire struct | | | |RCM flags | | |========= | | |RCM_RETIRE_REQUEST | Committed | flag RCM request entry points |RCM_RETIRE_NOTIFY | Committed | flag RCM notify entry points | | | |snode flags | | |============ | | |SFENCED | Proj. Pvt. | snode flag - fenced off | | | |specfs interfaces | | |================= | | |spec_fence_snode | Proj. Pvt. | fence off snode(s) |spec_unfence_snode | Proj. Pvt. | unfence snode(s) | | | |I/O retire agent | | |================ | | |io-retire.so | Proj. Pvt. | I/O retire agent module | | | |fmd I/O retire agent props | | |========================== | | |io-retire.conf | Uncommitted | I/O retire agent .conf file |"global-disable" | Uncommitted | Disable I/O retire |"fault-exceptions" | Proj. Pvt. | faults that don't retire | | | |retire store interfaces | | |======================= | | |/etc/devices/retire_store | Proj. Pvt. | Persistent retire store | | | |contract interfaces | | |=================== | | |CTT_DEVICE | Cons. Pvt. | device contract type |CT_CNACK | Cons. Pvt. | ctfs ioctl cmd for NACK | | | |/system/contract/device | Committed | ctfs device contract dir | | | |contract_device_create() | Proj. Pvt. | Create a contract post open |contract_device_open() | Proj. Pvt. | Create a contract at open |contract_device_offline() | Cons. Pvt. | Offline contract negotiation |contract_device_degrade() | Cons. Pvt. | Degrade event publish | | | |CT_ACK | Cons.Pvt | change is permitted |CT_NACK | Cons.Pvt | change is not permitted |CT_NONE | Proj. Pvt. | no contracts | | | |CT_DEV_ST_ONLINE | Committed | Online state |CT_DEV_ST_OFFLINE | Committed | Offline state |CT_DEV_ST_DEGRADED | Committed | Degrade state | | | |CTDP_ACCEPT | Committed | set of acceptable device states |CTDP_MINOR | Committed | contract minor's devfs path |CTDP_NONEG | Committed | auto NACK a contract break | | | |CTDS_STATE | Committed | state of device |CTDS_ASET | Committed | A-set (acceptable states set) |CTDS_MINOR | Committed | device member of contract | | | |ct_dev_tmpl_set_aset() | Committed | set A-set in template |ct_dev_tmpl_get_aset() | Committed | get A-set in template |ct_dev_tmpl_set_minor() | Committed | set minor path in template |ct_dev_tmpl_get_minor() | Committed | get minor path in template |ct_dev_tmpl_set_noneg() | Committed | set non-neg. term in template |ct_dev_tmpl_get_noneg() | Committed | get non-neg. term in template | | | |ct_dev_status_get_dev_state() | Committed | get device state in contract |ct_dev_status_get_aset() | Committed | get A-set in contract |ct_dev_status_get_minor() | Committed | get minor path in contract |ct_dev_status_get_noneg() | Committed | get the setting for NONEG | | | |ct_ctl_nack() | Committed | Negative ack for neg. event | | | | LDI event interfaces | | |===================== | | | ldi_get_eventcookie() |Obsolete committed| Deprecate old interfaces | ldi_add_event_handler() |Obsolete committed| Deprecate old interfaces | ldi_remove_event_handler() |Obsolete committed| Deprecate old interfaces | LDI_EV_CB_VERS | Committed | event callback vector vers. | LDI_EV_OFFLINE | Committed | LDI offline event | LDI_EV_DEGRADE | Committed | LDI degrade event | ldi_ev_cookie_t | Committed | LDI event cookie | ldi_ev_register_callbacks() | Committed | register notify/finalize | ldi_ev_notify() | Committed | Notify consumers | ldi_ev_finalize() | Committed | Finalize events for consumers | ldi_ev_get_type() | Committed | Get LDI event name | ldi_ev_remove_callbacks() | Committed |Remove all LDI event callbacks | LDI_EV_SUCCESS | Committed | LDI event success return code | LDI_EV_FAILURE | Committed | LDI event failure return code | LDI_EV_NONE | Proj. Pvt. | No matching LDI callbacks ================================================================================ X Man pages =========== Here are man pages for non project-private interfaces ------------------------------------------------------------------------------ NAME io-retire.conf - FMA I/O retire agent .conf file SYNOPSIS /usr/lib/fm/fmd/plugins/io-retire.conf INTERFACE LEVEL Stable DESCRIPTION The io-retire.conf file may be used to set pre-defined properties which affect the behavior of the I/O retire agent. Currently only two properties are defined: global-disable: If set to true, automatic I/O retire will be disabled. If set to false (the default), I/O retire will be automatic for all supported devices. fault-exceptions: May be set to a colon separated list of specific fault types which should not trigger a retire. This is for the (rare) faults that are known to be incorrectly diagnosed. Use of the global-disable property is primarily for Sun service personnel to diagnose problems. End users should not set this property unless told to do so by Sun Service personnel. The fault-exceptions property is only for Sun Private use. It will not be documented. SEE ALSO fmd(1M), fmadm(1M) ------------------------------------------------------------------------------ File Formats device(4) NAME device - device contract type SYNOPSIS /system/contract/device DESCRIPTION Device contracts allow processes to monitor events involving a device of interest and to react and/or block state changes involving such devices. Device contracts are managed using the contract(4) file system and the libcontract(3LIB) library. The process con- tract type directory is /system/contract/device. CREATION A device contract may be created in one of two ways: a) A process may create and activate a template and then invoke open on a minor node of the device. The act of opening will create a contract based on the terms in the activated template. b) A process may create a contract *after* it has opened a device by creating a template, setting appropriate terms (including the the path to a minor node) on the template and then invoking ct_tmpl_create() on the template. STATES, BREAKS and EVENTS A state refers to the state of the device which is the subject of the contract. Currently three states are defined for device contracts CT_DEV_ST_ONLINE The device is online and functioning normally CT_DEV_ST_DEGRADED The device is online but functioning in a degraded capacity CT_DEV_ST_OFFLINE The device is offline and is not configured for use A process creates a device contract with the kernel to get a guarantee that the device will be in an acceptable set of states as long as the contract is valid. This acceptable set (or A-set for short) is specified as one of the terms of the contract when the contract is created. When the devices moves to a state outside the A-set, the contract is broken. This breaking of the contract may be either asynchronous or synchronous, depending on whether the transition that led to the breaking of the contract is synchronous or asynchronous. The figure below shows the possible transitions between currently defined device states. <-- A --> /-----------------> DEGRADED | | | | | | S | | | | | v v S --> v ONLINE ------------> OFFLINE The arrows indicate the direction of the state transition. S refers to a transition that is synchronous and A refers to a transition that is asynchronous. If the breaking of a contract is asynchronous, then all that happens is that a critical event is generated and sent to the contract holder. The event is generated even if the contract holder has not subscribed to the event via the critical or informative event sets. If the breaking of the contract is synchronous, a critical contract event is generated with the CTE_NEG flag set to indicate that this is a negotiation event. The contract holder is expected to either ACK this change and allow the state change to occur or it may NACK the change to block it (if it has sufficient privileges). The term event refers to the transition of a device from one state to another. The event is named by the state to which the device is transitioning. Thus if a device is transitioning to the OFFLINE state, the name of the event is CT_DEV_ST_OFFLINE. An event may have no consequence for a contract, or it may result in the asynchronous breaking of a contract or it may result in a synchronous (i.e. negotiated) breaking of a contract. Events are delivered to a contract holder in three cases i) The contract holder has subscribed to the event via the critical or informative event sets. The event may be either critical or informative in this case depending on the subscription. ii) The device transitions to a state outside the contract's A-set and the transition is asynchonous. This will result in the asynchronous breaking of the contract and a critical event will be delivered to the holder. iii) The device transitions to a state outside the contract's A-set and the transition is synchronous. This will result in the synchronous breaking of the contract and a critical event with the CTE_NEG flag set will be delivered to the holder. Note that in cases ii) and iii) a critical event will be delivered even if the holder has not subscribed to the event via the critical or informative event sets. NEGOTIATION If the breaking of a contract is synchronous, the kernel begins negotiations with the contract holder by generating a critical event *before* the device changes state. The event will have the CTE_NEG flag set indicating that this is a negotiation event. The contract owner is allowed a limited period of time in which to either ACK the contract event (thus allowing the state change) or if it has appropriate privileges, NACK the state change thus blocking the state change. ACKs may be sent by the holder via ct_ctl_ack(3CONTRACT) and NACKs may be sent via ct_ctl_nack(3CONTRACT). If a contract holder does not send either a NACK or ACK within a specified period of time, an ACK is assumed and the kernel proceeds with the state change. Once the device state change is finalized, the contract subsystem sends negotiation end (NEGEND) critical messages to the contract owner indicating the final disposition of the state transition i.e. either success or failure. Once a contract is broken, a contract owner may choose to create a replacement contract. It may do this after the contract is broken or it may choose to do this synchronously with the breaking of the old contract via ct_ctl_newct(3CONTRACT). TERMS The following common contract terms, defined in contract(4), have device-contract specific attributes: informative set The default value for the informative set is CT_DEV_ST_DEGRADE i.e. transitions to the DEGRADED state will by default result in informative events. Use ct_tmpl_set_informative(3CONTRACT) to set this term. critical set The default value for the critical set is CT_DEV_ST_OFFLINE i.e. transitions to the OFFLINE state will by default result in critical events. Use ct_tmpl_set_critical(3CONTRACT) to set this term. The following contract terms can be read from or written to a device contract template using the named libcontract(3LIB) interfaces. These contract terms are in addition to those described in contract(4). CTDP_ACCEPT acceptable set or A-set This term is required for every device contract. It defines the set of device states which the contract owner expects to exist as long as the contract is valid. If a device transitions to a state outside this A-set, then the contract will break and will no longer valid. A critical contract event will be sent to the contract owner to signal this break. Use ct_dev_tmpl_set_aset() to set this term. There is no default A-set. This term is mandatory. Use ct_dev_tmpl_get_aset() to query a template for this term. CTDP_MINOR Specifies as it's value, the devfs path to a minor that is the subject of the contract. Used to specify the minor to be used for creating a contract when contract creation takes place other than at open time. If the contract is created synchronously at open(2) time, then this term is implied to be the minor node being opened. In this case this term need not be explicitly be set. Use ct_dev_tmpl_set_minor() to set this term. The default setting for this term is NULL i.e. no minor is specified. Use ct_dev_tmpl_get_minor() to query a contract template for the current setting of this term. CTDP_NONEG This term if set indicates that any negotiable departure from the contract terms should be NACKED i.e. the contract subsystem should assume a NACK for any negotiated breaking of the contract. This term is ignored for asynchronous contract breaks. Use ct_dev_tmpl_set_noneg() to set this term. The default setting is off. Use ct_dev_tmpl_get_noneg() to query a template for the setting of this term. STATUS In addition to the standard items, the status object read from a status file descriptor contains the following items if CTD_FIXED is specified: State of device CTDS_STATE Returns the current state of the device. Returns one of the following: CT_DEV_ST_ONLINE CT_DEV_ST_DEGRADED CT_DEV_ST_OFFLINE Use ct_dev_status_get_dev_state() to obtain this information. A-set of device contract CTDS_ASET Returns the "acceptable states" (A-set) of the device contract. The return value is a bitset of device states and may include one or more of the following: CT_DEV_ST_ONLINE CT_DEV_ST_DEGRADED CT_DEV_ST_OFFLINE Use ct_dev_status_get_aset() to obtain this information. Setting of noneg flag CTDS_NONEG Returns the current setting of the noneg flag. Returns 1 if the noneg flag is set else 0. Use ct_dev_status_get_noneg() to obtain this information. If CTD_ALL is specified, the following items are also avail- able: Device minor node CTDS_MINOR The devfs path of the device which is the subject of the device contract. Use ct_dev_status_get_minor(3CONTRACT) to obtain this information. EVENTS No new event related interfaces (beyond the standard contract event interfaces) are defined for device contract events. FILES /usr/include/sys/contract/device.h Contains definitions of events, status fields and event fields SEE ALSO ctrun(1), ctstat(1), ctwatch(1), open(2), ct_tmpl_set_critical(3CONTRACT), ct_tmpl_set_informative(3CONTRACT), ct_dev_tmpl_set_accept(3CONTRACT), ct_dev_tmpl_get_accept(3CONTRACT), ct_dev_tmpl_set_minor(3CONTRACT), ct_dev_tmpl_get_minor(3CONTRACT), ct_dev_tmpl_set_noneg(3CONTRACT), ct_dev_tmpl_get_noneg(3CONTRACT), ct_dev_status_get_dev_state(3CONTRACT), ct_dev_status_get_aset(3CONTRACT), ct_dev_status_get_minor(3CONTRACT), libcontract(3LIB), contract(4), privileges(5) ------------------------------------------------------------------------------ Contract Management Library Functions ct_dev_tmpl_set_param(3CONTRACT) NAME ct_dev_tmpl_set_aset, ct_dev_tmpl_get_aset, ct_dev_tmpl_set_minor, ct_dev_tmpl_get_minor, ct_dev_tmpl_set_noneg, ct_dev_tmpl_get_noneg - device contract template functions SYNOPSIS cc [ flag... ] file... -D_LARGEFILE64_SOURCE -lcontract [ library... ] #include #include int ct_dev_tmpl_set_aset(int fd, uint_t aset); int ct_dev_tmpl_get_aset(int fd, uint_t *asetp); int ct_dev_tmpl_set_minor(int fd, char *minor); int ct_dev_tmpl_get_minor(int fd, char *buf, size_t buflen); int ct_dev_tmpl_set_noneg(int fd); int ct_dev_tmpl_get_noneg(int fd, uint_t *negp); PARAMETERS fd A file descriptor from an open of the device contract template file in the contract filesystem (ctfs) aset A bitset of one or more of device states asetp A pointer to a variable in which the current A-set is to be returned. minor The devfs path (the /devices path without the "/devices" prefix) of a minor which is to be the subject of a contract buf A buffer in which the minor path is to be returned. buflen Size of buffer buf. negp A pointer to a uint_t variable for receiving the current setting of the "non-negotable" term in the template DESCRIPTION These functions read and write device contract terms and operate on device contract template file descriptors obtained from the contract(4) i.e. ctfs filesystem. The ct_dev_tmpl_set_aset() and ct_dev_tmpl_get_aset() functions write and read the "acceptable states" set (or A-set for short). This is the set of device states guaranteed by the contract. Any departure from these states will result in the breaking of the contract and a delivery of a critical contract event to the contract holder. The A-set value is a bitset of one or more of the following device states. CT_DEV_ST_ONLINE CT_DEV_ST_DEGRADED CT_DEV_ST_OFFLINE The ct_dev_tmpl_set_minor() and ct_dev_tmpl_get_minor() functions write and read the minor term i.e. the device resource that is to be the subject of the contract. The value is a devfs path to a device minor node. The ct_dev_tmpl_set_noneg() and ct_dev_tmpl_get_noneg() functions write and read the non-negotiable term. If this term is set, synchronous negotiation events are automatically NACKed on behalf of the contract holder. For ct_dev_tmpl_get_noneg(), the variable pointed to by negp is set to 1 if the "noneg" term is set or to 0 otherwise. RETURN VALUES Upon successful completion, these functions return 0. Otherwise, they return a non-zero error value. ERRORS The ct_dev_tmpl_set_aset() function will fail if: EINVAL Invalid template file descriptor or A-set The ct_dev_tmpl_set_minor() function will fail if: EINVAL Invalid argument(s) ENXIO The minor named by minor path does not exist The ct_dev_tmpl_set_noneg() function will fail if: EPERM Process lacks sufficient privilege to NACK a device state change. The ct_dev_tmpl_get_aset(), ct_dev_tmpl_get_minor() and ct_dev_tmpl_get_noneg() functions will fail if: EINVAL Invalid arguments specified ENOENT Requested term is not set INTERFACE LEVEL Committed SEE ALSO libcontract(3LIB), contract(4), device(4), lfcompile(5) ------------------------------------------------------------------------------ Contract Management Library Functions ct_ctl_adopt(3CONTRACT) NAME ct_ctl_adopt, ct_ctl_abandon, ct_ctl_newct, ct_ctl_ack, | ct_ctl_nack(), ct_ctl_qack - common contract control functions SYNOPSIS cc [ flag... ] file... -D_LARGEFILE64_SOURCE -lcontract [ library... ] #include int ct_ctl_adopt(int fd); int ct_ctl_abandon(int fd); int ct_ctl_newct(int fd, uint64_t evid, int templatefd); int ct_ctl_ack(int fd, uint64_t evid); | int ct_ctl_nack(int fd, uint64_t evid); int ct_ctl_qack(int fd, uint64_t evid); DESCRIPTION These functions operate on contract control file descriptors obtained from the contract(4) file system. The ct_ctl_adopt() function adopts the contract referenced by the file descriptor fd. After a successful call to ct_ctl_adopt(), the contract is owned by the calling process and any events in that contract's event queue are appended to the process's bundle of the appropriate type. The ct_ctl_abandon() function abandons the contract refer- enced by the file descriptor fd. After a successful call to ct_ctl_abandon() the process no longer owns the contract, any events sent by that contract are automatically removed from the process's bundle, and any critical events on the contract's event queue are automatically acknowledged. Depending on its type and terms, the contract will either be orphaned or destroyed. The ct_ctl_ack() function acknowledges the critical event specified by evid. If the event corresponds to an exit nego- tiation, ct_ctl_ack() also indicates that the caller is prepared for the system to proceed with the referenced reconfiguration. | The ct_ctl_nack() function acknowledges the critical negotiation | event specified by evid. ct_ctl_nack() also indicates that the | caller wishes to block the proposed reconfiguration indic- | ated by event evid. Depending on the contract type, this function | may require certain privileges to be asserted in the process' | effective set. This function will fail and return an error | if the event represented by evid is not a negotiation event. The ct_ctl_qack() function requests a new quantum of time for the negotiation specified by the event ID evid. The ct_ctl_newct() function instructs the contract specified by the file descriptor fd that when the current exit nego- tiation completes, another contract with the terms provided by the template specified by templatefd should be automati- cally written. RETURN VALUES Upon successful completion, ct_ctl_adopt(), | ct_ctl_abandon(), ct_ctl_newct(), ct_ctl_ack(), ct_ctl_nack(), and ct_ctl_qack() return 0. Otherwise, they return a non-zero error value. ERRORS The ct_ctl_adopt() function will fail if: EBUSY The contract is in the owned state. EINVAL The contract was not inherited by the caller's process contract or was created by a process in a different zone. | The ct_ctl_abandon(), ct_ctl_newct(), ct_ctl_ack(), ct_ctl_nack() and ct_ctl_qack() functions will fail if: EBUSY The contract does not belong to the calling process. The ct_ctl_newct() and ct_ctl_qack() functions will fail if: ESRCH The event ID specified by evid does not correspond to an unacknowledged negotiation event. The ct_ctl_newct() function will fail if: EINVAL The file descriptor specified by fd was not a valid template file descriptor. | The ct_ctl_ack() and ct_ctl_nack() function will fail if: ESRCH The event ID specified by evid does not | correspond to an unacknowledged negotiation event. | The ct_ctl_nack() function will fail if: | | EPERM The calling process lacks the appropriate | privileges required to block the reconfiguration The ct_ctl_qack() function will fail if: ERANGE The maximum amount of time has been requested. ATTRIBUTES See attributes(5) for descriptions of the following attri- butes: ____________________________________________________________ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | |_____________________________|_____________________________| | Interface Stability | Evolving | |_____________________________|_____________________________| | MT-Level | Safe | |_____________________________|_____________________________| SEE ALSO libcontract(3LIB), contract(4), attributes(5), lfcompile(5) ------------------------------------------------------------------------------ Contract Management Library Functions ct_dev_status_get_minor(3CONTRACT) NAME ct_dev_status_get_dev_state, ct_dev_status_get_aset, ct_dev_status_get_minor, ct_dev_status_get_noneg SYNOPSIS cc [ flag... ] file... -D_LARGEFILE64_SOURCE -lcontract [ library... ] #include #include int ct_dev_status_get_dev_state(ct_stathdl_t stathdl, uint_t *statep); int ct_dev_status_get_aset(ct_stathdl_t stathdl, uint_t *asetp); int ct_dev_status_get_minor(ct_stathdl_t stathdl, char *buf, size_t buflen); int ct_dev_status_get_noneg(ct_stathdl_t stathdl, uint_t *nonegp); PARAMETERS stathdl A status object returned by ct_status_read(3CONTRACT) statep A pointer to a uint_t variable for receiving the current state of the device which is the subject of the contract asetp A pointer to a uint_t variable for receiving the acceptable state set (i.e. A-set) for the contract buf A buffer for receiving the devfs path of a minor in a contract. buflen Size of buf nonegp A pointer to a uint_t variable for receiving the setting of the "noneg" term. DESCRIPTION These functions read contract status information from a status object (stathdl) returned by ct_status_read(3CONTRACT). The ct_dev_status_get_dev_state() function returns the current state of the device which is the subject of the contract. This can currently be one of the following: CT_DEV_ST_ONLINE - the device is online and funtioning normally CT_DEV_ST_DEGRADED - the device is online but degraded CT_DEV_ST_OFFLINE - the device is offline and not configured The ct_dev_status_get_aset() function returns the A-set of the contract. This can currently be the bitset of one or more of the following states: CT_DEV_ST_ONLINE CT_DEV_ST_DEGRADED CT_DEV_ST_OFFLINE The ct_dev_status_get_minor() function reads the devfs path of the minor participating in the contract. The ct_dev_status_get_noneg() function returns the "noneg" setting for the contract. A 1 is returned via the nonegp argument if "NONEG" is set, else 0 is returned. RETURN VALUES Upon successful completion, these functions return 0. Otherwise, they return a non-zero error value. ERRORS The ct_dev_status_get_minor() function will fail if: EOVERFLOW The buffer size is too small to hold the result The ct_dev_status_get_dev_state(), ct_dev_status_get_aset(), ct_dev_status_get_minor() and ct_dev_status_get_noneg() functions will fail if: EINVAL Invalid argument(s) specified ENOENT The requested data is not present in the status object. INTERFACE LEVEL Committed SEE ALSO ct_status_read(3CONTRACT), ct_status_free(3CONTRACT), libcontract(3LIB), contract(4), device(4), lfcompile(5) ------------------------------------------------------------------------------ Kernel Functions for Drivers ldi_ev_get_cookie(9F) NAME ldi_ev_get_cookie - Get an LDI event cookie for a specified event SYNOPSIS #include int ldi_ev_get_cookie(ldi_handle_t lh, char *evname, ldi_ev_cookie_t *cookiep); INTERFACE LEVEL Solaris DDI specific (Solaris DDI). PARAMETERS ldi_handle_t lh A layered handle representing the device for which the event notification was requested. char *evname The string name of the event. ldi_ev_cookie_t *cookiep A pointer of type ldi_ev_cookie_t. Contains a pointer to the event cookie on return. DESCRIPTION The ldi_ev_get_cookie() function accepts the string name of a state change event affecting the device represented by the layered driver handle "lh" and returns an opaque cookie on success. The call will be successful if the framework supports event notification for the event named by evname. If successful, the function will return an opaque cookie through the cookiep parameter. The cookie is required in subsequent calls for registering callbacks on events. Two LDI events are currently defined: LDI_EV_OFFLINE The device is moving to the offlined state LDI_EV_DEGRADE The device is moving to the degraded state. RETURN VALUES LDI_EV_SUCCESS The event cookie was created successfully. LDI_EV_FAILURE An error occurred and the cookie was not created. CONTEXT The ldi_ev_get_cookie() function can be called from user and kernel contexts only. SEE ALSO ldi_ev_register_callbacks(9F), ldi_ev_remove_callbacks(9F) ------------------------------------------------------------------------------ Kernel Functions for Drivers ldi_ev_register_callbacks(9F) NAME ldi_ev_register_callbacks - add a notify and/or finalize callback SYNOPSIS #include int ldi_ev_register_callbacks(ldi_handle_t lh, ldi_ev_cookie_t cookie, ldi_ev_callback_t *callb, void *arg, ldi_ev_callback_id_t *id); INTERFACE LEVEL Solaris DDI specific (Solaris DDI). PARAMETERS ldi_handle_t lh A layered handle representing the device for which the event notification was requested. ldi_ev_cookie_t cookie An opaque event cookie for the event type returned by a previous call to ldi_ev_get_cookie(9F) ldi_ev_callback_t *callb: A data structure which currently has the following members: struct ldi_ev_callback { uint_t cb_vers; int (*cb_notify)(ldi_handle_t, ldi_ev_cookie_t cookie, void *arg, void *ev_data); void (*cb_finalize)(ldi_handle_t, ldi_ev_cookie_t cookie, int ldi_result, void *arg, void *ev_data); } ldi_ev_callback_t; where cb_vers Version of callback vector. Must be set to LDI_EV_CB_VERS by the caller. The arguments passed into the callbacks when they are invoked, include: int ldi_result The actual result of the state change operation/event passed to finalize callback: LDI_EV_SUCCESS: State change succeeded LDI_EV_FAILURE: The state change failed void *ev_data Event specific data. void *arg A pointer to opaque caller private data ldi_ev_callback_id_t *id Unique system wide registration id returned by ldi_ev_register_callbacks(9F) upon successful registration. DESCRIPTION The ldi_ev_register_callbacks() interface allows layered drivers to register notify and finalize callbacks for certain events. These events are listed in the ldi_ev_get_cookie(9F) man page. The notify callback is invoked only for events that can be blocked, just before the event occurs. Layered drivers that have registered notify callbacks for that event have the opportunity of blocking such events. The finalize callback is invoked once the final disposition of the state of a device (specifically a device minor node) is known. The callback is invoked with this result, either LDI_EV_SUCCESS (state change succeeded) or LDI_EV_FAILURE (state change failed). This allows layered driver consumers to finalize any changes they made in response to a previous "notify" callback. For example, a layered driver's notify callback may be invoked in response to a LDI_EV_OFFLINE event. The layered driver may reconfigure itself to stop using the device and permit the change to go forward. Once that happens, the I/O framework will attempt to actually offline the device. This offline attempt can have two possible outcomes: success or failure. In the former case, the finalize callback will be invoked with the ldi_result argument set to LDI_EV_SUCCESS and the layered driver will know that the device has been offlined. In the latter case finalize is invoked with the ldi_result set to LDI_EV_FAILURE and the layered driver knows that the state change failed - in which case it may choose to reconfigure itself to start using the device again. Finalize callbacks can be registered for all events including events that cannot be blocked. A layered driver can also propagate these events up the software stack by using interfaces offered by the LDI event framework. The layered driver may use ldi_ev_notify() to propagate notify events occurring on minors it imports onto minors it exports. Similarly, it may use ldi_ev_finalize() to propagate finalize events. Both ldi_ev_notify() and ldi_ev_finalize() will propagate events to device contracts as well as LDI callbacks registered against the exported minor nodes. The LDI event framework has the following guarantees and and requirements with respect to these callbacks: 1. The notify() callback is invoked before an event (represented by the event cookie) occurs on a device (represented by the layered driver handle) and is invoked only for for events that can be blocked. If the callback returns LDI_EV_FAILURE, the event will be blocked. If the callback returns LDI_EV_SUCCESS, the event will be allowed to proceed. If any other value is returned, it is an error - an error message will be logged and the event will be blocked. An example of an event that can be blocked and for which notify callbacks may be invoked is the offline event LDI_EV_OFFLINE. 2. The finalize callback is invoked for all events (including events that cannot be blocked) after the event has occurred. It will be invoked with either LDI_EV_SUCCESS indicating that the event successfully happened or LDI_EV_FAILURE indicating that the event did not occur. The finalize callback returns no values. A good example of an event that cannot be blocked is the degrade event (LDI_EV_DEGRADE). 3. Layered drivers may register one or both of these callbacks (i.e. only for a notify event or only for a finalize event or for both) against any LDI handle that they may possess. If a finalize or notify event is not being registered, the corresponding pointer in the ldi_ev_callback_t structure must be set to NULL. It is an error to attempt a registration with both callbacks set to NULL. 4. A notify and/or finalize callback will be invoked only if the corresponding LDI handle is open. If an LDI handle against which the callbacks are registered is closed the corresponding finalize and notify callbacks will not be invoked as it is assumed that the layered driver is no longer interested in the device.. There *is* however an exception to this rule. See 5 below. 5. A layered driver that closes it's LDI handle in it's notify routine *will* receive the corresponding finalize callback after the event has occurred. Because the LDI handle has been closed, the finalize callback will be invoked with a NULL LDI handle. It is the responsibility of the layered driver to maintain state in it's private "arg" parameter so that it can reopen the device (if desired) in it's finalize callback. One example where this may happen is with the LDI_EV_OFFLINE event. A layered driver's notify callback may be invoked for an offline event. The layered driver may choose to allow this event to proceed. In that case, since it has a layered open of the device, it *must* close the LDI handle so that offline can succeed (an offline of a device will not succeed if there is *any* open of the device, layered or otherwise). Since the layered driver has closed the LDI handle in the notify routine, it's finalize callback (if any) will be invoked with a NULL LDI handle. It is the responsibility of the layered driver to maintain state (such as the device path or devid) in it's private "arg" parameter, so that in the finalize routine, it can do a layered open of the device if the device offline failed. The above is the *only* exception where the finalize callback is invoked if the LDI handle has been closed. In all other cases if the LDI handle has been closed, no corresponding callbacks will be invoked. 6. For the LDI_EV_OFFLINE event, for the offline to succeed, it is imperative that there be no opens (including LDI handles) to the device. If a layered driver's notify callback is invoked for an offline event and the driver intends to allow the offline to proceed, the driver *must* close the corresponding LDI handle. 7. The notify and finalize callbacks are not automatically deregistered even if the corresponding LDI handle has been closed. It is the responsibility of the layered driver to deregister these callbacks when they are not required. It may do so using the ldi_ev_remove_callbacks(9F) interface. The LDI framework may panic if the entity registering the callback (such as a dip, dev_t or module) no longer exists on the system and the corresponding callbacks have not been unregistered. 8. The LDI event framework guarantees that if a layered driver receives a notify event, it will also receive a finalize event except if the layered consumer itself blocked the event i.e. it returned LDI_EV_FAILURE from it's notify callback. In the latter case, the layered driver knows that the event has been blocked and therefore does not need the finalize callback. 9. If a layered driver propagates notify events on minors it imports to minors it exports, it *must* first propagate these events up the software stack via ldi_eve_notify() in it's notify callback. It must do so before attempting to check if it block the event. This is required, because a layered driver cannot release the device if consumers up the stack are still using the device. If ldi_ev_notify(), returns LDI_EV_FAILURE, the callback must immediately return LDI_EV_FAILURE from it's notify callback. If ldi_ev_notify() returns LDI_EV_SUCCESS, then the state change is permissible as far as consumers higher up in the software stack are concerned. The layered driver must then determine if it can permit the state change. If the state change is to be allowed, the layered driver must return LDI_EV_SUCCESS. If the layered driver determines that the state change should not be permitted, it *must* invoke ldi_ev_finalize() on minors it exports with a result of LDI_EV_FAILURE (to inform consumers up the stack) and then return LDI_EV_FAILURE from it's notify callback. 10. The LDI event framework generates finalize events at the earliest point where a failure is detected. If the failure is detected in the framework (such as in ldi_ev_notify()) the framework will generate the finalize events. In the event that a failure is first detected in a layered driver i.e. in the notify callback of a layered driver, the layered driver must use ldi_ev_finalize() to send finalize events up the software stack . See EXAMPLES for code snippets describing this scenario. 11. The finalize callback *must* first reconfigure itself before attempting to propagate the event up the software stack via ldi_ev_finalize(9F). This is so that the minors it exports are available and ready for use before the finalize event is propagated up the software stack. 12. It may so happen that the event propagated up the software stack is not the same as the event for which a layered driver's notify/finalize callback is invoked. For example, a layered driver's callback(s) may be invoked for an offline event, but the driver may choose to only propagate the degraded event to *its* consumers (since it may have a mirror/copy of the data on the device.) In that case, the layered driver *must* generate a different event cookie i.e. one corresponding to the degraded event via ldi_ev_get_cookie(9F) and use that cookie in its propagation calls i.e. ldi_ev_notify(9F) and ldi_ev_finalize(9F). Once the registration of the callback(s) is successful, an opaque ldi_ev_callback_id_t structure is returned which may be used to unregister the callback(s) later. RETURN VALUES LDI_EV_SUCCESS Callback(s) added successfully. LDI_EV_FAILURE Failed to add callback(s) CONTEXT The ldi_ev_register_callbacks() function can be called from user and kernel contexts only. EXAMPLES Example I Here is a typical registration and callbacks for the OFFLINE event static int event_register(void) { ldi_handle_t lh; ldi_ev_callback_t callb; ldi_ev_cookie_t off_cookie; if (ldi_ev_get_cookie(lh, LDI_EV_OFFLINE, &off_cookie) == LDI_EV_FAILURE) goto fail; callb.cb_vers = LDI_EV_CB_VERS; callb.cb_notify = off_notify; callb.cb_finalize = off_finalize; if (ldi_ev_register_callbacks(lh, off_cookie, &callb, arg, &id) != LDI_EV_SUCCESS) goto fail; } static void event_unregister(ldi_ev_callback_id_t id) { ldi_ev_remove_callbacks(id); } static int off_notify(ldi_handle_t lh, ldi_ev_cookie_t off_cookie, void *arg, void *ev_data) { ASSERT(strcmp(ldi_ev_get_type(off_cookie), LDI_EV_OFFLINE) == 0); /* Map imported minors to exported minor */ widget_map(lh, &minor, &spec_type); /* * Call ldi_ev_notify() to propagate events to our consumers. * This *must* happen before we check if offline should be blocked */ if (ldi_ev_notify(dip, minor, spec_type, off_cookie, ev_data) != LDI_EV_SUCCESS) return (LDI_EV_FAILURE); /* * Next, check if we can allow the offline */ if (widget_check(lh) == WIDGET_SUCCESS) { widget_save_path(arg, lh); widget_reconfigure(lh, RELEASE); ldi_close(lh); return (LDI_EV_SUCCESS) } /* * We cannot permit the offline. The first layer that detects * failure i.e. us, must generate finalize events for our consumers */ ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE, off_cookie, ev_data); return (LDI_EV_FAILURE); } /* * The finalize callback will only be called if we returned LDI_EV_SUCCESS * in our notify callback. ldi_result passed in may be SUCCESS or FAILURE */ static void off_finalize(ldi_handle_t NULL_lh, ldi_ev_cookie_t off_cookie, int ldi_result, void *arg, void *ev_data) { ldi_handle_t lh; ASSERT(strcmp(ldi_ev_get_type(off_cookie), LDI_EV_OFFLINE) == 0); path = widget_get_path(arg); widget_map_by_path(path, &minor, &spec_type); if (ldi_result == LDI_EV_SUCCESS) { ldi_ev_finalize(dip, minor, spec_type, LDI_EV_SUCCESS, off_cookie, ev_data); return; } /* The offline failed. Reopen the device */ ldi_open_by_name(path, &lh); widget_reconfigure(lh, REACQUIRE); ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE, off_cookie, ev_data); } Example II Here is a typical registration and callbacks for the DEGRADE event static int event_register(void) { ldi_handle_t lh; ldi_ev_callback_t callb; ldi_ev_cookie_t dgrd_cookie; if (ldi_ev_get_cookie(lh, LDI_EV_DEGRADE, &dgrd_cookie) == LDI_EV_FAILURE) goto fail; /* no notify callbacks allowed for degrade events */ callb.cb_vers = LDI_EV_CB_VERS; callb.cb_notify = NULL; /* NULL, notify cannot be used for DEGRADE */ callb.cb_finalize = dgrd_finalize; if (ldi_ev_register_callbacks(lh, dgrd_cookie, &callb, arg, &id) != LDI_EV_SUCCESS) goto fail; } static void event_unregister(ldi_ev_callback_id_t id) { ldi_ev_remove_callbacks(id); } /* * For degrade events. ldi_result will always be LDI_EV_SUCCESS */ static void dgrd_finalize(ldi_handle_t lh, ldi_ev_cookie_t off_cookie, int ldi_result, void *arg, void *ev_data) { ASSERT(ldi_result == LDI_EV_SUCCESS); ASSERT(strcmp(ldi_ev_get_type(off_cookie), LDI_EV_DEGRADE) == 0); widget_map(lh, &minor, &spec_type); widget_reconfigure(lh, RELEASE); ldi_ev_finalize(dip, minor, spec_type, LDI_EV_SUCCESS, dgrd_cookie, ev_data); } SEE ALSO ldi_ev_get_cookie(9F), ldi_ev_notify(), ldi_ev_finalize(), ldi_ev_remove_callbacks(9F) ------------------------------------------------------------------------------ Kernel Functions for Drivers ldi_ev_notify(9F) NAME ldi_ev_notify - propagate notification of a state change event SYNOPSIS #include int ldi_ev_notify(dev_info_t *dip, minor_t minor, int spec_type, ldi_ev_cookie_t cookie, void *ev_data); INTERFACE LEVEL Solaris DDI specific (Solaris DDI). PARAMETERS dev_info_t *dip The devinfo node of the layered consumer exporting the minor minor_t minor The minor number of the exported minor int spec_type The type of minor (S_IFCHR or S_IFBLK) ldi_ev_cookie_t cookie An opaque event cookie for the event type returned by a previous call to ldi_ev_get_cookie(9F) void *ev_data Event specific data DESCRIPTION The ldi_ev_notify() function propagates an event up the software stack. It may result in two actions: 1. Invocation of LDI callback handlers registered by layered drivers up the software stack 2. Device contract events generated on minors exported to userland Note that the event propagated up the software stack may be different from the event received by the layered driver invoking ldi_ev_notify(). For example, a volume manager may receive an "offline" event on one of it's LDI opened disks, but may choose to propagate a "degraded" event on minors it exports to userland (since it may have more than one copy of the data) The event cookie argument to ldi_ev_notify() may thus be different from the event cookie currently possessed by the layered driver. If that is the case, the layered driver must generate another event cookie via a new ldi_ev_get_cookie() call. The ldi_ev_* interfaces are designed to ensure that a "finalize" call is generated for layered driver consumers at the earliest point where an LDI_EV_FAILURE is detected. If this happens inside the LDI event framework, then the framework will invoke finalize. In the event a layered driver detects/generates an LDI_EV_FAILURE, then the layered driver must invoke ldi_ev_finalize(). Here is an example of a layered driver invoking ldi_ev_finalize() for the "foo" event: static int widget_notify(ldi_handle_t lh, ldi_ev_cookie_t foo_cookie, void *arg, void *ev_data) { ASSERT(strcmp(ldi_ev_get_type(foo_cookie), LDI_EV_FOO) == 0); /* Map imported minors to exported minor */ widget_map(lh, &minor, &spec_type); /* * Call ldi_ev_notify() to propagate events to our consumers. * This *must* happen before we check if widget should block * foo */ if (ldi_ev_notify(dip, minor, spec_type, foo_cookie, ev_data) != LDI_EV_SUCCESS) return (LDI_EV_FAILURE); /* * Next, check if we can allow the foo event */ if (widget_release(lh, LDI_EV_FOO) == WIDGET_SUCCESS) { return (LDI_EV_SUCCESS) } /* * We cannot permit the foo event. The first layer that detects * failure i.e. us, must generate finalize events for *our* * consumers */ ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE, foo_cookie, ev_data); return (LDI_EV_FAILURE); } RETURN VALUES LDI_EV_SUCCESS Consumers up the software stack permit state change LDI_EV_FAILURE Consumers are blocking the state change CONTEXT The ldi_ev_notify() function can be called from user and kernel contexts only. SEE ALSO ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F), ldi_ev_remove_callbacks(9F) ------------------------------------------------------------------------------- Kernel Functions for Drivers ldi_ev_finalize(9F) NAME ldi_ev_finalize - propagate disposition of a state change event SYNOPSIS #include void ldi_ev_finalize(dev_info_t *dip, minor_t minor, int spec_type, int ldi_result, ldi_ev_cookie_t cookie, void *ev_data); INTERFACE LEVEL Solaris DDI specific (Solaris DDI). PARAMETERS dev_info_t *dip The devinfo node of the layered consumer exporting the minor minor_t minor The minor number of the exported minor int spec_type The type of minor (S_IFCHR or S_IFBLK) int ldi_result The final disposition of the state change ldi_ev_cookie_t cookie An opaque event cookie for the event type returned by a previous call to ldi_ev_get_cookie(9F) void *ev_data Event specific data DESCRIPTION The ldi_ev_finalize() function propagates the final disposition of an event up the software stack. It may result in two actions: 1. Invocation of "finalize" LDI callback handlers registered by layered drivers up the software stack 2. Device contract "negotiation end" (CT_EV_NEGEND) events generated on minors exported to userland Note that the event propagated up the software stack may be different than the event received by the layered driver invoking ldi_ev_finalize(). For example, a volume manager may receive an "offline" event on one of it's LDI opened disks, but may choose to propagate a "degraded" event on minors it exports to userland. The event cookie argument to ldi_ev_notify() may thus be different from the event cookie currently possessed by the layered driver. If that is the case, the layered driver must generate another event cookie via a new ldi_ev_get_cookie() call. RETURN VALUES None CONTEXT The ldi_ev_finalize() function can be called from user and kernel contexts only. EXAMPLE Invoking ldi_ev_finalize(9F) from widget's finalize callback static void widget_finalize(ldi_handle_t lh, ldi_ev_cookie_t foo_cookie, int ldi_result, void *arg, void *ev_data) { ASSERT(strcmp(ldi_ev_get_type(foo_cookie), LDI_EV_FOO) == 0); /* Map imported minor to exported minors */ widget_map(lh, &minor, &spec_type); if (ldi_result == LDI_EV_SUCCESS) { ldi_ev_finalize(dip, minor, spec_type, LDI_EV_SUCCESS, foo_cookie, ev_data); } /* * The event foo failed. Reconfigure yourself * *before* propagating */ widget_reconfigure(lh, LDI_EV_FOO, REACQUIRE); ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE, foo_cookie, ev_data); } SEE ALSO ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F), ldi_ev_remove_callbacks(9F) -------------------------------------------------------------------------------- Kernel Functions for Drivers ldi_ev_get_type(9F) NAME ldi_ev_get_type - Get event name string from event cookie SYNOPSIS #include char *ldi_ev_get_type(ldi_ev_cookie_t cookie); INTERFACE LEVEL Solaris DDI specific (Solaris DDI). PARAMETERS ldi_ev_cookie_t cookie An opaque event cookie for the event type returned by a previous call to ldi_ev_get_cookie(9F) DESCRIPTION The ldi_ev_get_type() function returns the event string represented by the LDI event cookie "cookie". RETURN VALUES On success returns the event string represented by cookie, else returns NULL. CONTEXT The ldi_ev_get_type() function can be called from user and kernel contexts only. SEE ALSO ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F), ldi_ev_remove_callbacks(9F) -------------------------------------------------------------------------------- Kernel Functions for Drivers ldi_ev_remove_callbacks(9F) NAME ldi_ev_remove_callbacks - Remove all callbacks for a given callback ID SYNOPSIS #include void ldi_ev_remove_callbacks(ldi_ev_callback_id_t id); INTERFACE LEVEL Solaris DDI specific (Solaris DDI). PARAMETERS ldi_ev_callback_id_t id An opaque data structure returned on successful calls to ldi_ev_register_callbacks(9F) DESCRIPTION The ldi_ev_remove_callback() function unregisters any callbacks registered via ldi_ev_register_callbacks(9F). Once this function returns, the callback ID is no longer valid. Note that the finalize and notify callback exist independently of the LDI handle and are not automatically removed when the LDI handle is closed. It is up to the layered driver to remove these callbacks via ldi_ev_remove_callbacks() when the callbacks are no longer needed. The LDI framework may panic the system if the entity registering the callback (a dev_t, dip or module) no longer exists on the system and the callbacks have not been unregistered. RETURN VALUES None CONTEXT The ldi_ev_remove_callbacks() function can be called from user and kernel contexts only. SEE ALSO ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F) ------------------------------------------------------------------------------ NAME request_offline, notify_online, notify_remove SYNOPSIS #include int prefixrequest_offline(rcm_handle_t *handle, char *rsrcname, pid_t pid, uint_t flag, char **reason, rcm_info_t **dependent_info); int prefixnotify_online(rcm_handle_t *handle, char *rsrcname, pid_t pid, uint_t flag, rcm_info_t **dependent_info); int prefixnotify_remove(rcm_handle_t *handle, char *rsrcname, pid_t pid, uint_t flag, rcm_info_t **dependent_info); ARGUMENTS handle handle provided by RCM daemon rsrcname name of resource pid process pid further identifies DR client flag prefixrequest_offline() may contain the following bit field. RCM_QUERY check if resource can be offlined; do not perform operation, exclusive with RCM_FORCE. RCM_FORCE request is urgent | prefixrequest_offline may also contain the following flag | | RCM_RETIRE_REQUEST | called in the context of I/O retire. Apply | constraints to ensure that only non-critical | devices can be offlined. If non-critical, | release the resource (i.e. the device) so | that retire can be successful. | | prefixnotify_online/remove may also contain the following flag | | RCM_RETIRE_NOTIFY | Perform any I/O retire related cleanup | actions required in the online or remove | entry points. | reason pointer to string describing reason of refusal dependent_info info related to dependent resources DESCRIPTION prefixrequest_offline() is invoked when a request comes in to offline the resource. The module may refuse to release the resource by returning RCM_FAILURE and updating reason to point to a dynamically allocated buffer containing a string describing the reason of refusal. The memory associated with reason is managed by the RCM daemon. | If the RCM_RETIRE_REQUEST flag is set, the call is in the context | of an I/O retire operation. The RCM module must check if the device | is a critical device. If it is, it should return RCM_FAILURE. | It does not need to update reason. If the device is non-critical | and the RCM_RETIRE flag is set, the module should release the | resource and return RCM_SUCCESS. If the client exports higher level resources which depends on rsrcname, prefixrequest_offline() should propagate the request to the dependents by calling rcm_request_*(). If the call returns RCM_CONFLICT or RCM_FAILURE, prefixrequest_offline() must pass the return code back to the RCM daemon and pass the info field from rcm_request_*() back as dependent_info. If RCM_QUERY is specified, the client should return RCM_SUCCESS if rsrcname or it's dependents are not a critical system resources. RCM_CONFLICT must be returned otherwise. In either case, the client should not act on the actual resource. If RCM_FORCE is specified, the module should make extra efforts to release the resource, such as using the force option to unmount a file system. prefixnotify_online() is invoked when a previous request to remove the | resource is canceled or the I/O retire fails. The DR client can access the resource and proceed with normal operation. The online notification should be passed to higher level resources by calling rcm_notify_online(). | If the RCM_RETIRE_NOTIFY flag is set, prefixnotify_online() | should perform any cleanup or reacquisition needed if it chooses | to start using the device again. prefixnotify_remove() is invoked when a resource is removed from the | system or in the case of I/O retire, has been retired. This notification is always proceeded by an prefixrequest_offline() invocation. Registration on rsrcname is discarded by the RCM daemon after prefixrequest_remove() returns. | If the RCM_RETIRE_NOTIFY flag is set, prefixnotify_remove() should | perform any cleanup now that the resource has been retired. RETURN VALUES RCM_SUCCESS must be returned on success. RCM_CONFLICT should be returned if one or more rcm_request_offline() | calls returns RCM_CONFLICT. RCM_CONFLICT should not be returned if | the RCM_RETIRE_REQUEST flag is set. Use RCM_FAILURE instead. RCM_FAILURE should be returned if the client or any of the dependent resources cannot be suspended and none of the dependent resources has a DR operation conflict. ------------------------------------------------------------------------------