RETIRE AGENT FOR I/O DEVICES

I BACKGROUND
============
There are three major steps in the Solaris Fault Management process:

a) Detection and handling of an error
   ----------------------------------
   The trigger for the I/O fault management process is the detection
   of a device error in the kernel. The error may be detected by
   a device different from the faulty device - for example a CPU,
   or it may be detected by a driver associated with the faulty
   device. In the former case, the I/O fault handling framework
   works with drivers to localize the error to a specific device.
   In the latter, the error has already been localized. In both
   cases the affected driver is responsible for handling the error
   and dispatching an ereport to userland.

b) Diagnosing the problem
   ----------------------
   Diagnosis Engines (DEs) in userland consume the ereports
   and pinpoint the faulty device and the type of fault. Once a
   device is diagnosed as faulty, a fault event that contains
   the identity of the faulty device is generated.

c) Retiring a faulty device
   ------------------------
   A retire agent is responsible for reacting to a fault event
   and offlining or disabling the faulty device. A retire agent
   for CPU and memory is already in place. This document describes
   the design of a generic Solaris retire agent for I/O devices.

II Hot device removal in Solaris
================================
Currently, hot device removal (i.e. removal of a device while
Solaris is running), falls into one of 2 categories

a) Coordinated device removal
-----------------------------
   In coordinated device removal, removal of a device is done in cooperation
   with the Solaris operating system. This process involves checking for
   existing users of the device, getting them to stop using the device
   (via RCM), quiescing the device, removing Solaris data structures for
   that device and rendering the device safe for removal. The device is then
   physically removed. In the event that one or more users are still using
   the device, the device removal is aborted. Board DR and SCSI hotplug are
   two examples of this approach.

b) Surprise device removal
--------------------------
   In surprise device removal, the device removal is not coordinated with
   Solaris and happens asynchronously. Typically a physical device removal
   generates an interrupt and the interrupt handler initiates cleanup of OS
   data structures. Any existing consumers of the device fail with EIO. A
   good example of this approach is USB.

III The Problem
===============
Irrespective of the mechanism used to achieve I/O retire, there are
a number of problems that any approach to I/O retire must solve:

1. Retire policy
----------------
   Any decision to retire a device must be in concord with policy
   (if any) set by the system administrator. An administrator or
   Sun service personnel may on occasion need to disable device
   retire temporarily. Retire code should check policy settings
   before deciding to retire a device.

2. Retire safety
------------------ 
   In some cases, it may be safer not to retire a device.
   For example, a device that contains the root filesystem should not be
   retired. Similarly, certain platform specific drivers (for example
   environment monitoring drivers) should not be retired as they are
   required for proper functioning of the system. In such cases it would 
   be best to keep the device running in a degraded state.

3. Retiring a device in use
----------------------------
   A device that is open cannot be detached (i.e. unconfigured). One
   possible solution to this problem is to use RCM. DR uses RCM to
   inform consumers that a device is going away. Consumers then stop
   using the device allowing DR to proceed. Unfortunately, RCM doesn't
   handle all consumers (only those with RCM modules) and therefore
   is unsuitable as a general mechanism for I/O retire. RCM is also
   a userland framework and does not deal with kernel consumers.

4. Retire semantics
-------------------
   There are several issues that need resolution regarding the behavior
   of a retired device. The first issue is behavior with respect to
   existing consumers. Secondly, we need to determine how such a device
   appears to new consumers. Finally, we need to settle upon an acceptable
   level of observability for the device. Clearly we cannot have the device
   completely disappear from Solaris until it is physically replaced
   or removed. 

5. Retire persistence
---------------------
   A device once retired may not be replaced/removed immediately.
   To avoid having to reretire the device all over again after the
   next boot, we need to persist the retire across reboots. Any
   mechanism employed to persist retires should be usable early
   in boot so that retired devices are never configured into the system
   in the first place. This is important, as a configured device may be
   opened by a consumer making it harder to retire the device (again).
   
7. Unretire
-----------
   Once a device is replaced or fixed, we need some way to bring it back
   into service (unretire). This unretire could be automatic or manual. Manual
   unretire requires operator intervention after a device has been replaced.
   A key requirement here is that after unretire (whether manual or automatic),
   a system should not require a reboot to bring the replaced device back
   into service.


IV Proposed solution
====================

1. Retire policy
-----------------
   Since the FMA framework and Solaris software have better knowledge
   of the fault and it's implications, we propose that I/O retire be
   minimally configurable. There are three primary reasons for
   configurability:

	a. To enable Sun service personnel to disable retire so that
	   problems can be diagnosed

	b. To allow sysadmins to disable retire for site-specific
	   reasons.

 	c. To disable I/O retire for certain rare fault types where the
 	   diagnosis is known to be incorrect.
 
   We propose two properties that can be set in the retire agent's
   .conf file. The first ("global-disable") can be used by
   Sun service personnel to temporarily block I/O retire so that
   they can diagnose problems. The default value of this property
   will be false i.e. I/O retire is enabled by default.
 
   The second property ("fault-exceptions") is a colon separated
   list of fault types for which I/O retire should be disabled. This
   is expected to be used for the (rare) fault types which are known
   to be diagnosed incorrectly. This property will be private and will
   not be documented.

2. Event subscription
---------------------
   The I/O retire agent will subscribe to all "fault.io.*" events and uses
   the ASRUs in such events to pinpoint the device to be retired. It expects
   the ASRUs to be in the "dev" scheme. After a lot of discussion by the
   FMA portfolio review committee, it was decided that due to the coarse
   nature of current FMA diagnoses, automatic retire will only be undertaken
   if a single device is pinpointed as faulty i.e. if the "list.suspect" event
   (consumed by the I/O retire agent) either has a single member or multiple
   members with the same ASRU.

   In addition, the agent also subscribes to the "list.repaired" event to detect
   device repair or replacement. The unretire process is triggered by the
   receipt of a list.repaired event.

3. Retire constraints
----------------------
   Certain devices cannot be safely retired without compromising
   the stability of the system. This project will provide a mechanism
   which will constrain retire to only those devices deemed non-critical
   to system operation. Two types of entities may impose constraints
   on retire: userland and kernel entities.

   i) Userland constraints
   ========================
   This project will enhance the contracts framework (PSARC/2003/193)
   to create a new contract type (device contracts). A device contract is
   an agreement or a contract between a process and the kernel regarding 
   the state of the device. A device contract may be created when a
   relationship is formed between a device and a process i.e. at open(2)
   time, or it may be created at some point after the device has been
   opened. A device contract once formed may be broken by either party.
   A device contract can be broken by the process by an explicit abandon of the 
   contract or by an implicit abandon when the process exits. A device contract
   can be broken by the kernel either asynchronously (without negotiation) or
   synchronously (with negotiation). Exactly which happens depends on the device
   state transition. The following state diagram shows the transitions between
   device states. Note that the transitions are "unconfiguration" transitions.
   Configuration transitions are intentionally left out as they are not
   relevant for I/O retire. Future projects which need configuration 
   functionality can easily add them to the device contract framework.

                		<-- A -->	
	 		 /-----------------> DEGRADED
			 |			| 			   
          		 |			| 			   
	  		 |			| S			   
	  		 |			| |			   
          		 |			| v			   
          		 v       S -->		v
			ONLINE ------------> OFFLINE


   In the figure above, the arrows indicate the direction of transition. The
   letter S refers to transitions which are inherently synchronous i.e.
   require negotiation and the letter A indicates transitions which are
   asynchronous i.e. are done without contract negotiations. A good example
   of a synchronous transition is the ONLINE -> OFFLINE transition. This
   transition cannot happen as long as there are consumers which have the
   device open. Thus some form of negotiation needs to happen between the
   consumers and the kernel to ensure that consumers either close devices
   or disallow the move to OFFLINE. Certain other transitions such as
   ONLINE --> DEGRADED for example, are inherently asynchronous i.e.
   non-negotiable. A device that suffers a fault that degrades its
   capabilities will become degraded irrespective of what consumers it has,
   so a negotiation in this case is pointless.

   The following device states are currently defined for device contracts:

	CT_DEV_ST_ONLINE
		The device is online and functioning normally
	CT_DEV_ST_DEGRADED
		The device is online but is functioning in a degraded capacity
	CT_DEV_ST_OFFLINE
		The device is offline and is no longer configured

   Refer to PSARC/2003/193 for background information on contracts.
   A typical consumer of device contracts starts out with a contract
   template and adds terms to that template. These include the
   "acceptable set" (A-set) term, which is a bitset of device states which
   are guaranteed by the contract. If the device moves out of a state in
   the A-set, the contract is broken. The breaking of the contract can
   be asynchronous in which case a critical contract event is sent to the
   contract holder but no negotiations take place. If the breaking of the
   contract is synchronous, negotations are opened between the affected
   consumer and the kernel. The kernel does this by sending a critical
   event to the consumer with the CTE_NEG flag set indicating that this
   is a negotiation event. The consumer can accept this change by sending
   a ACK message to the kernel. Alternatively, if it has the necessary
   privileges, it can send a NACK message to the kernel which will block
   the device state change. To NACK a negotiable event, a process must
   have the {PRIV_SYS_DEVICES} privilege asserted in its effective set.

   Other terms include the "minor path" term, specified explicitly if the
   contract is not being created at open(2) time or specified implicitly
   if the contract is being created at open time via an activated template.

   A contract event is sent on any state change to which the contract
   owner has subscribed via the informative or critical event sets. Only
   critical events are guaranteed to be delivered. Since all device state
   changes are controlled by the kernel and cannot be arbitrarily generated
   by a non-privileged user, the {PRIV_CONTRACT_EVENT} privilege does not
   need to be asserted in a process's effective set to designate an event as
   critical. To ensure privacy, a process must either have the same effective
   userid as the contract holder or have the {PRIV_CONTRACT_OBSERVER} privilege
   asserted in its effective set in order to observe device contract events
   off the device contract type specific endpoint.
   
   Yet another term available with device contracts is the "non-negotiable"
   term. This term is used to pre-specify a NACK to any contract negotiation.
   This term is ignored for asynchronous state changes. For example, a
   provcess may have the A-set {ONLINE|DEGRADED} and make the contract
   non-negotiable. In this case, the device contract framework assumes a
   NACK for any transition to OFFLINE and blocks the offline. If the A-set
   is {ONLINE} and the non-negotiable term is set, transitions to OFFLINE
   are NACKed but transitions to DEGRADE succeed.

   The OFFLINE negotiation (if OFFLINE state is not in the A-set for a contract)
   happens just before the I/O framework attempts to offline a device
   (i.e. detach a device and set the offline flag so that it cannot be
   reattached). This need not necessarily be the result of retire activity.
   A device contract holder is expected to either NACK the offline
   (if privileged) or release the device and allow the offline to proceed.

   The DEGRADE contract event (if DEGRADE is not in the A-set for a contract)
   is generated just before the I/O framework transitions the device state
   to "degraded" (i.e. DEVI_DEVICE_DEGRADED in I/O framework terminology).
   As far as I/O retire is concerned, a device may become degraded at
   three points during the fault management process:
   
	a) FMA I/O error handling by the driver may result in the device
	   state being set to the DEGRADED state

	b) An inability to immediately offline a device due to userland or
	   kernel consumer. The I/O retire code will in such cases move the
	   device to the DEGRADED state until it can be offlined.

	c) An inability to retire the device (because the device provides a
	   critical service) will result in the device moving to the DEGRADED
	   state.

   The contract holder is expected to ACK or NACK a negotiation event
   within a certain period of time. If the ACK/NACK is not received
   within the timeout period, the device contract framework will behave
   as if the contract does not exist and will proceed with the event. In the
   I/O retire case, I/O code will be aware that constraints have not been
   applied and will behave accordingly.

   The contracts framework provides an elegant mechanism that solves
   two problems:
   
	a). A process can use it to reconfigure itself in the face of an
            impending device state change (and in the process release the
	    device to allow the state change)

	b)  It can also use it to impose constraints on the state change so
	    that state changes that may cause problems are disallowed. Note
	    that blocking a state change requires that the process be
	    privileged.

   It is expected that the device contract framework will be generally
   useful for consumers other than I/O retire. We expect future projects
   to include more device states for device contracts.
    
     ii) Kernel constraints
   ======================
   Certain resources may be in use solely by kernel consumers. Such resources
   will not have a corresponding userland consumer and device contracts
   cannot impose constraints on their retire. To allow such consumers to have a
   say in the retire of I/O devices we will enhance the event interfaces
   provided by the LDI framework (PSARC/2001/769). The new LDI event interfaces
   will provide two services to I/O retire:
   
	1. It allows the imposition of kernel constraints by kernel
	   consumers of devices

	2. It allows layered drivers to generate device contract events on
	   minors they export for events affecting minors they import.
	   For example if SVM were converted to use the LDI, events affecting
	   disks opened by SVM could be propagated to contracts on device
	   minors exported by SVM

   The new LDI event interfaces are expected to be generally useful i.e.
   they are not meant solely for I/O retire.

   The current LDI events interfaces have no consumers and this enhancement
   will not affect anyone. For any consumers that want the old style LDI
   events, equivalent functionality is available via the new LDI event
   interfaces.

   Two primary interfaces are being defined by the project:

   a) A notification callback service which informs layered consumers of an
   impending state change giving them an opportunity to either reconfigure
   themselves or to block the state change. The reconfiguration will allow
   resources to be released allowing the state change to proceed. The
   notification and release will be synchronous i.e. the release will be
   carried out by the callback. This interface will also hook up with the
   device contract framework to provide state change events to userland
   contract holders of minors exported by the layered driver.

   b) A post event "finalize" callback service that indicates whether the
   state change succeeded. This allows consumers to finalize their 
   reconfiguration. This offers functionality that is equivalent to the
   old LDI event interfaces. The interface will also generate "negotiation end" 
   events for all applicable contracts.

   This project defines two LDI events: LDI_EV_OFFLINE and LDI_EV_DEGRADE.
   They have the same semantics as the corresponding device contract events.
   The notification interface is used to register a "notify" callback with
   the LDI event framework. The notify callback serves two main purposes
 
	i) It allows layered drivers to impose constraints on the retire
 	   of devices 
 
 	ii) It allows layered consumers to reconfigure themselves and release
 	    devices, allowing the offline of such devices to succeed.
 
   The finalize callback serves as an indicator of final disposition of a
   device state change i.e. it indicates whether the device state change
   succeeded or not. 
 
   The finalize callbacks are always called at the end of all defined
   LDI events. The notify callback is only called if the specified event
   can be potentially blocked or vetoed by a consumer. So for the events
   defined by this proposal, the offline event results in both a "notify"
   and a "finalize" callback, but a degrade event (which cannot be blocked)
   will only result in finalize callbacks being invoked. In addition,
   it is guaranteed that if a layered driver receives a notify event, it
   will receive a finalize event, unless the layered driver itself rejected
   the state change. See the man pages at the end of this document for
   details.

   For examples of how a typical kernel consumer (ZFS) could use
   these interfaces, see the example section at the end of this document.

   iii) Legacy constraints
   =======================

   Certain kernel consumers like UFS and SVM have not been converted to use the
   LDI (and likely will never be). To allow such consumers to apply constraints
   on the retire process, we will use the RCM framework to impose constraints.
   Note that this approach will be used only for legacy consumers since the
   contracts framework is a cleaner and more correct approach to managing
   device state changes. The contracts framework is cleaner because it allows
   state changes to be managed in a simpler fashion from within the application 
   without requiring an additional external piece of software i.e. an RCM
   module. It is a more appropriate approach because RCM can only handle
   synchronous events like userland initiated DR. RCM cannot be used for
   asynchronous events such as an in-kernel device state change. Device
   contracts can be used for both types of events. Also since RCM is a
   userland framework, it cannot usually handle kernel consumers of devices.


4. Retiring a device in use
----------------------------
   There are several different types of scenarios where I/O retire may be
   invoked. In the following, a device is said to be in use if there are
   are existing opens of the device either by a userland or kernel consumer.

	1. Device not in use: In this case, I/O retire is very simple. We can
	   detach the device and set an "offline" flag which will prevent the
	   device from being attached by subsequent configuration operations
	   such as open(2).

	2. Device is in use: In this case, things are a little more
  	   complicated. First we have to check if there are any retire
  	   constraints for this device. If these constraints exist,
  	   the device cannot be retired. If not, the device is retirable
  	   Retire can take two forms: fencing or offlining.

	   Offlining is used when the device is no longer in use and when
	   the device node can be safely offlined. A device that was
	   originally in use may no longer be in use because as part of
	   the constraint checking process, consumers may release devices.

  	   Fencing is used when a retirable device is in use, since a
	   device cannot be offlined when there is an existing open of
	   the device. Fencing essentially consists of using the specfs
	   filesystem to isolate the device. All new opens of the device
	   are failed with ENXIO. This will require changes at the specfs
	   layer for userland consumers and in the LDI for kernel consumers.
	   I/O operations and unconfiguration operations (such as close)
	   will continue to function normally so that existing consumers
	   can release the device.
           
	   If a "real" detach i.e. offline isn't possible immediately, we will
	   schedule a deferred detach at periodic intervals. It is expected
   	   that as I/Os fail due to FMA error handling (PSARC/2002/288)
	   in the driver, the consumer will release the device allowing detach
	   and offline to take place.

      	   Another (optional) mechanism to speed up the transition from
   	   a fenced state to a detached state is to have drivers of devices
   	   register an interest in retire events targeting the device they
	   drive by registering a callback via NDI event services. They can
	   then be notified that the device they control is the target of a
	   retire operation. Such drivers can speed up the process of retire
	   by returning appropriate errors (such as EIO) for I/O to the
	   affected device, forcing consumers of such devices to stop using
   	   the device. Eventually, the device will have no consumers, allowing
   	   the framework to detach and offline the device. 

   	3. Interaction with MPXIO: There are two possible errors for a device
	   under MPXIO control:

		a) Path fault: The fault may be in a path component along a
		path to the device. In this case the path component will be
		retired. The MPXIO framework will detect that the path is
		not available and will switch over to another path.

		b) Disk fault: In this case since the disk itself is bad,
		switching the path will not help. I/O retire will retire
		the virtual disk node under the VHCI making it unavailable
		along any path. Any attempt to open the device will return
		ENXIO.

  	4. Interaction with RCM: There are certain kernel consumers that cannot
  	   impose constraints via contracts (a userland mechanism) or via LDI
 	   (since they have not been converted to use the LDI). These include
  	   UFS and SVM. To allow such entities to impose constraints we will
  	   use RCM. The changes required are minimal - we will use the
  	   request_offline, notify_online and notify_remove entry points of
  	   RCM modules. The three entry points will be invoked with the
  	   RCM_RETIRE flag set to indicate that the operation is in the context
  	   of a retire operation to allow for the slightly different behavior
  	   required relative to retire operations. No attempt will be made to
  	   enhance RCM to inform RCM clients of asynchronous events such
  	   as the "degraded" event. The goal here is only to check for retire
  	   constraints, not improve RCM.

  	5. Other consumers: There may be consumers of devices which do not
  	   use either device contracts (userland consumers) or the LDI (kernel)
  	   In this case, if the consumer is the only consumer of the device,
  	   automatic retire will be blocked. If there is another consumer
  	   of the device and that consumer uses device contracts or the LDI
	   to allow a retire to occur, an automatic retire will be initiated. 
  	   The device will be fenced off from new consumers but existing
	   consumers can continue to use the device until the next reboot
	   at which point the device will offlined before it's first attach.

5. Behavior and Observability of a fenced off/retired device
-------------------------------------------------------------
   For a fenced off device, all configuration operations will fail
   with ENXIO. Unconfiguration operations like close, etc., will
   however succeed so that the device can be closed.

   A device that has been offlined (i.e. is not merely fenced off)
   will be detached from it's driver and cannot be attached. All
   operations on the device will fail with ENXIO

   A offlined or fenced off device is still DRable and can be DRed out
   via standard DR tools like cfgadm. A retired device replaced via DR
   will stay retired, until the device is unretired.

   For observability, the device will still be present in the kernel
   device tree (albeit in the retired state) and so will be visible
   through tools like prtconf and (k)mdb. The output of prtconf will
   indicate that the device has been retired (see Example VIII.3).
   Similarly cfgadm output for a retired device will indicate the
   Condition "failed" indicating that the device is no longer usable.
   

6. Persistence of retire:
-------------------------
   The retired status of a device will be persisted in a file
   - /etc/devices/retire_store. This file will be read early in boot,
   (on x86 systems, this file will be included in the boot archive)
   and a list of devices that have been retired will be created.
   If the device is not a self identifying node, then it will have
   it's devinfo node marked with the DEVI_DEVICE_OFFLINE flag which will
   prevent it from attaching. If the devinfo node is a self identifying
   node, it will be fenced off rather than offlined, since self identifying
   nexii typically remove devinfo nodes that fail to attach. The end
   result is the same for both self identifying and non-self identifying
   devices - the device will be unavailable to all consumers.
   Any attempts to open such devices will fail with ENXIO. A global
   integer variable ddi_retire_store_bypass will be made available
   to disable this feature - this can be set via (k)mdb or /etc/system.
   An alternative mechanism to bypass this persistent store is to boot
   the system with the "ask" flag i.e. "boot -a" and specify /dev/null
   as the retire store. This is useful for recovery if a critical device
   makes it into the persistent store due to a software error.
   

7. Unretire Devices
-------------------
   The unretire of a device may be manual or automatic - in either
   case it is the FMA framework that indicates to the retire agent that
   a device has been repaired. In the former case, the FMA framework is
   notified by a user via the command: "fmadm repair". In the latter case
   the FMA framework detects via some form of serial ID/GUID that a device
   has been replaced. In both cases, the FMA framework generates a
   "list.repaired" event indicating that the device has been repaired.
   The retire agent uses this event to initiate the unretire process.
   Once unretire is complete, no reboot will be required to configure
   and use the unretired device. Note that if a retired device is removed
   (while the system is down or via DR) and replaced by a different device
   or by the same device, the replacement will stay retired until the
   unretire process is triggered (manually or automatically).

8. Driver modifications
------------------------
   This I/O retire proposal does not directly require any modifications to
   a device driver to allow the device to be retired. However, the I/O
   retire agent is dependant on a correct diagnosis by a diagnosis engine
   which in turn relies on error telemetry from the kernel. While some
   device faults can be diagnosed without modifying the device driver,
   certain faults can only be diagnosed with proper error information
   from a hardened i.e. modified driver. Hardening requires a driver to
   conform to the FMA I/O fault services spec as outlined in the
   Writing Device Driver (WDD) guide. See the I/O fault services chapter
   in the WDD for more details.


V Non goals
=============

1. Retire of devices other than disk and nexus devices
--------------------------------------------------------
   The current phase of I/O retire will only cover the retire of nexus
   and disk devices. Retire of other devices (such as NICs) require domain
   specific constraints imposed via the interfaces provided by this project 
   and are beyond scope of this project. It is expected that domain experts will
   develop the necessary constraints and use the interfaces supplied by this
   project to impose them. Until then, such devices will not be automatically
   retirable. However, if a device is currently not in use, it will be
   automatically retired even if it is not a disk or a nexus.
   
   
2. Fencing limitations for certain devices with kernel consumers 
-----------------------------------------------------------------
   Some device accesses in the kernel cannot be completely fenced
   off. For example, devices that are directly accessed in the
   kernel via the bdev_* and cdev_* interfaces instead of the
   LDI interfaces (PSARC/2001/769) cannot be completely fenced off.
   specfs does not play a role in such accesses and cannot intercept
   accesses. It is expected that such consumers will be eventually
   migrated to LDI which will fully support fencing.

3. Converting layered drivers to use the LDI
---------------------------------------------
   Certain kernel consumers have not been converted by the layered driver
   project to use the LDI interfaces. This project will not attempt to
   convert any drivers to use the LDI framework since that is beyond the
   scope of this project.

4. Enhancing RCM
----------------
   RCM will be used solely to impose constraints on the retire process i.e.
   for the "offline" device reconfiguration event. Since RCM by it's 
   userland architecture is limited to use with synchronous userland
   events such as userland initiated DR, it is not easy to use it for
   asynchronous kernel generated events such as the "degraded" device
   reconfiguration event. This project will not make this attempt as the
   goal here is to support retire not enhance the RAS value of RCM clients.

5. Ability to retire software (drivers)
---------------------------------------
   This design only covers retire of hardware. Retire of software i.e.
   drivers is a non-goal for this phase.

VI Sequence of events in I/O retire
==================================
   The following describes the steps that are taken (in order) during
   FMA I/O retire. To simplify things we consider several scenarios

   i) Device retire fails because of a retire constraint
   ------------------------------------------------------
 	1. Derive retire target from the fault event
 
 	2. Retire policy: Check the value of the "global-disable" policy
 	   If not false, abort retire.
 
 	3. Use device contracts, LDI "notify" callbacks and RCM calls to check
 	   if the device retire is permitted.
 
 	4. One or more constraints reject the retire.
 
 	5. Send negotiation end (NEGEND) events indicating failure to
           all applicable device contracts and invoke finalize callbacks
	   for LDI consumers indicating failure.
 
 	6. Move the device to the degraded state. As a result device contract
 	   events and LDI finalize callbacks for the "degraded" state change
 	   are generated.
 
 
    ii) Device retire is permitted but device cannot be offlined
    -------------------------------------------------------------
 	1. Derive retire target from the fault event
 
 	2. Retire policy: Check the value of the "global-disable" policy
 	   If not false, abort retire.
 
 	3. Use device contracts, LDI "notify" callbacks and RCM calls to check
 	   if the device retire is permitted.
 
 	4. Constraint checking allows the retire to proceed.
 
   	5. Persist the retire. 
 
 	6. A message is logged indicating that the device has been
 	   successfully retired. A reference is made to the utility 
           (fmadm(1M)) that may be used to unretire the device.
  
 	7. Check the status of offline in step 3. It failed.

	8. Fence off the device. 	

	9. Schedule periodic jobs to attempt a deferred offline of the device
 	    
   	10. Send negotiation end (NEGEND) events indicating failure to all
 	   applicable device contracts and invoke finalize callbacks for LDI
  	   consumers indicating failure. Because the device is fenced off in
  	   this specific case, any attempts to reopen the device will fail.
 
 	11. Move the device to the degraded state. As a result device contract
 	   events and LDI finalize callbacks for the "degraded" state change
 	   are generated.
 
 	12. At some later point of time, the device is successfully offlined.
   	   The offline process includes device contract notifications and LDI
  	   notify callbacks. Once the device is successfully offlined, remove
  	   the device from the degraded state and generate "success" NEGEND
  	   message for device contracts and LDI_EV_SUCCESS finalize callbacks
  	   for LDI consumers.
 
 
    iii) Device retire is permitted and device can be offlined
    -------------------------------------------------------------
 	1. Derive retire target from the fault event
 
 	2. Retire policy: Check the value of the "global-disable" policy
 	   If not false, abort retire.
 
 	3. Use device contracts, LDI "notify" callbacks and RCM calls to check
 	   if the device retire is permitted.
 
 	4. Constraint checking allows the retire to proceed.
 
 	5. Persist the retire.
 
 	6. A message is logged indicating that the device has been
 	   successfully retired. A reference is made to the utility 
           (fmadm(1M)) that may be used to unretire the device.

	7. Check the status of the offline in step 3. It succeeded. 

 	8. Send negotiation end (NEGEND) messages indicating success to all
 	   applicable device contracts and invoke finalize callbacks for LDI
 	   consumers indicating success.
 
    iv) Behavior on reboot
    ------------------------
 
 	1. I/O framework reads the file /etc/devices/retire_store
 
 	2. I/O framework creates an in-core cache of device that
 	   have been retired.
 
 	3. The system emits a informational message to the console indicating
 	   that one or more retired devices exist on the system.
 
 	4. On the first attempt to attach every devinfo node, the
 	   framework checks if it exists in the in-core cache and if
 	   it is, either blocks the attach and offlines the device
  	   (for PROM based device node) or fences off the device
  	   (for "non-PROM" devices). In both cases, the effect is
 	   the same - the retired device is unavailable to consumers.
 
    v) Unretire sequence
    ----------------------
        The unretire process is initiated by the retire agent when it is
        informed by the FMA framework via a "list.repaired" event that
        the device has been repaired/replaced.
 
        When a device is unretired, we go through the following steps:
 
 	  1. Remove the device from the persistent retire store
 
 	  2. Unschedule the deferred detach i.e. remove any scheduled
 	     job that is attempting to offline the device.
 
 	  3. Tear down fences:
 	     If the device was fenced off (via specfs) tear down those
 	     fences.
 
 	  4. Online the device:
 	     If the device is in an offlined state, then online it.
 
 
    	The device is now unretired. A subsequent operation such as open()
	will configure the device.

VII Future Work
===============
The following functionality is not a part of the current set of deliverables
but may be delivered in future projects or RFEs. 

1. New events: The events for device contracts and LDI events proposed by this 
   project use well established I/O framework events but are limited to events
   that are directly used by I/O retire. It is expected that future projects
   will enhance this set to add other generic events. For example if DR
   were to start using device contracts and LDI events, we would expect the
   following additional events to be defined:
	a) suspend
	b) resume

2. Retiring software: It is possible that software (like device drivers) may
   have design defects that can be handled via retire. This is a new approach
   to driver defect management that needs further investigation.

VIII Examples
=============

1. Process uproc and the /dev/widget device
-------------------------------------------
	Here is sample code that indicates how a userland process "uproc"
	creates a contract for a device "/dev/widget" and negotiates the
	breaking of the contract.

	// Get a template for the device contract
	tfd = open64(CTFS_ROOT "/device/template", O_RDWR);

	// Open the device contract pbundle for this process
	efd = open64(CTFS_ROOT "/device/pbundle", O_RDONLY);

	// Set informative and critical events for this contract
	ct_tmpl_set_critical(tfd, CT_DEV_ST_OFFLINE|CT_DEV_ST_DEGRADED);
	ct_tmpl_set_informative(tfd, 0);
	ct_dev_tmpl_set_aset(tfd, CT_DEV_ST_ONLINE|CT_DEV_ST_DEGRADED);

	// Activate this template so that the next open creates a contract
	ct_tmpl_activate(tfd);

  	/*
  	 * Note that this is not the only way to create a device contract.
  	 * A contract may also be created post-open by setting a minor
	 * path in the template via the ct_dev_tmpl_set_minor() interface
	 * and then creating a contract via ct_tmpl_create()
  	 */
	dfd = open("/dev/widget", O_RDWR);

	// Clear the activate so other opens don't create contracts
	ct_tmpl_clear(tfd);

	(void) close(tfd);

	// Get the contract's ID
	contract_latest(&ctid);

	// Get the contract's ctl file
	ctlfd = contract_open(ctid, "device", "ctl", O_WRONLY);

	for (;;) {
		// Block waiting for events
		ct_event_read(efd, &ev);

		// Read an event and check if it is ours
		if (ct_event_get_ctid(ev) != ctid) {
			ct_event_free(ev);
			continue;
		}

		event = ct_event_get_type(ev);
		evid = ct_event_get_evid(ev);
		flags = ct_event_get_flags(ev);

		if (event & CT_DEV_ST_DEGRADED) {
			/*
			 * The degrade event is within our A-set. The
			 * contract is intact. Since we subscribed to
			 * this event as a critical event, we need to ACK
			 * it so that the event is freed by the kernel.
			 */
			ct_ctl_ack(ctlfd, evid);
			uproc_reconfig(DEGRADE);

		} else if ((event & CT_DEV_ST_OFFLINE) && (flags & CTE_NEG)) {

			// uproc code - check if state change permissible
			...

			if (state_change_allowed) {
				uproc_reconfig(OFFLINE);
   		 		(void) close(dfd);	// close the device
				ct_ctl_ack(ctlfd, evid);
				ct_event_free(ev);
   				break;
			} else {
				// Block the state change
				ct_ctl_nack(ctlfd, evid);
			}
		} else {
			ct_event_free(ev);
			goto error;
		}
		ct_event_free(ev);
	}

	// We ACKed the state change to offline
	for (;;) {

		ct_event_read(efd, &ev);

		// Read an event and check if it is ours
		if (ct_event_get_ctid(ev) != ctid) {
			ct_event_free(ev);
			continue;
		}

		// Negend is a critical event, so ACK it.
		event = ct_event_get_type(ev);
		evid = ct_event_get_evid(ev);
		if (event == CT_EV_NEGEND) {
			ct_ctl_ack(ctlfd, evid);
			ct_event_free(ev);
   			ct_ctl_abandon(ctlfd);	// contract is broken
		} else {
			ct_event_free(ev);
			goto error;
		}
	}

	// cleanup
	(void) close(efd);
	(void) close(ctlfd);
}

2. I/O retire and ZFS
---------------------
Here is pseudo code illustrating how I/O retire code will work for a disk device
consumed by ZFS.

a. ZFS uses the LDI (either ldi_open_by_name() or ldi_open_by_devid()) to open
   a disk minor node.
	// ZFS code
	ldi_open_by_name(path, ...&ldi_handle...)
		or 
	ldi_open_by_devid(devid, ...&ldi_handle...)
	
b. ZFS obtains an event cookie for an offline LDI event for this minor node
	// ZFS code
	ldi_ev_get_cookie(ldi_handle, LDI_EV_OFFLINE, &event_cookie)

c. ZFS then registers notify and finalize callbacks for this minor.
   The notify callback is responsible for checking if the proposed
   reconfiguration is permissible.
   
   For example, if the disk hosts a critical filesystem and ZFS cannot replace
   it, ZFS is expected to return LDI_EV_FAILURE to indicate this. If ZFS can
   replace it, it is expected to reconfigure itself and release the device.

	// ZFS code
 	callb.vers = LDI_EV_CB_VERS;
 	callb.notify = zfs_notify;
 	callb.finalize = zfs_finalize;

 	ldi_ev_register_callbacks(ldi_handle, event_cookie, &callb, arg,
	    &callback_id);

	// Here is the notify callback for ZFS
	int
 	zfs_notify(ldi_handle_t ldi_handle, ldi_ev_cookie_t ecookie,
	    void *arg, void *ev_data)
	{
		// If uninteresting event, just return success and allow
		// state change to proceed

		if (strcmp(ldi_ev_get_type(ecookie), LDI_EV_OFFLINE) != 0)
			return (LDI_EV_SUCCESS); 

		// Since ZFS exports no minors to external consumers
		// for general purpose use, there is no need to invoke
		// ldi_ev_notify() here 

		// ZFS code
		...
		if (disk can be offlined) {
			zfs_reconfigure(ldi_handle, OFFLINE);
			zfs_save_path(ldi_handle, arg);
			ldi_close(ldi_handle);
			return (LDI_EV_SUCCESS);
		} else if (disk required by ZFS) {
			return (LDI_EV_FAILURE);
		}
	}

   Since ZFS does not export any minors for general purpose external use,
   there are no userland or kernel consumers to notify about an impending
   change in the minor they are consuming. If ZFS exports minors to userland
   or kernel, then it would be expected to map the imported minors to
   exported minors and invoke ldi_ev_notify() on them. If the result of
   that call is LDI_EV_FAILURE the ZFS notify callback should return
   LDI_EV_FAILURE. 

d. The finalize callback is responsible for informing ZFS if the state
   change i.e. offline actually succeeded. If it succeeded, then ZFS can
   assume that the device is gone, else it can start using the device again.
   

	// Here is the finalize callback for ZFS
	void
 	zfs_finalize(ldi_handle_t ldi_handle,ldi_ev_cookie_t event_cookie,
	    int ldi_result, void *arg, void *ev_data)
	{
  		ldi_handle_t new_handle;

		// ZFS code - NOP if this is not an offline event
		if (strcmp(ldi_get_event(event_cookie), LDI_EV_OFFLINE) != 0)
			return;

		// This is an offline event
		if (ldi_result != LDI_EV_SUCCESS) {
			// ZFS code
			path = zfs_get_saved_path(arg);
   			/* A reopen is not guaranteed to succeed */
    	 		if (ldi_open_by_name(path, &new_handle) == 0)
   				zfs_reconfigure(new_handle, ONLINE);
		}
	}

3. Sample prtconf output for retired devices
--------------------------------------------

System Configuration:  Sun Microsystems  sun4u
Memory size: 2048 Megabytes
System Peripherals (Software Nodes):

SUNW,Sun-Blade-2500
    scsi_vhci, instance #0
   ----- snip ------
    pci, instance #2
        scsi, instance #0
            disk (driver not attached)
            tape (driver not attached)
            sd, instance #0
            sd, instance #1 (retired)	<=====
        scsi, instance #1
            disk (driver not attached)
            tape (driver not attached)
   ----- snip ------
	ide, instance #0
            disk (driver not attached)
            cdrom (driver not attached)
            sd, instance #30 (retired)	<======
   ----- snip ------
    iscsi, instance #0
    pseudo, instance #0

-------------------------------------------------------------------------------

IX Interface Table
==================
================================================================================
|Interface Name                | Stability level |     Comments
================================================================================
|I/O Framework interfaces      |		 |
|========================      |		 |
|e_ddi_retire_persist()	       | Proj. Pvt.	 | Persist a device retire
|e_ddi_retire_unpersist()      | Proj. Pvt.	 | Unpersist a device retire
|e_ddi_retired()      	       | Cons. Pvt.      | Check if device is retired
|e_ddi_retire_device()         | Proj. Pvt.	 | offline or fence a device
|e_ddi_unretire_device()       | Proj. Pvt.	 | online or unfence a device
|e_ddi_set_retire_interval()   | Proj. Pvt.	 | set deferred offline interval
|			       |		 |
|New modctls                   | 		 |     
|===========		       | 		 |     
|MODRETIRE	               | Proj. Pvt	 | Kernel processing for retire 
|MODUNRETIRE                   | Proj. Pvt.	 | Kernel processing (unretire)
|MODISRETIRED		       | Cons. Pvt.       | Check if device is retired
|MODRETIRERETRY		       | Proj. Pvt.      | Set offline retry interval
|			       |                 |
|devinfo structure additions   | 	         |
|===========================   |		 |
|devi_ct		       | Proj. Pvt.	 | List of contracts on device
|			       |		 |
|devinfo structure flags       | 	         |
|=======================       |		 |
|DEVI_RETIRED		       | Cons. Pvt.      | Device has been retired
|DEVI_CONSTRAINT	       | Proj. Pvt.	 | constraints applied
|			       |		 |
|libdevinfo interfaces         | 	         |
|=====================         |		 |
|di_retire_device()            | Proj. Pvt.	 | Retire a device
|di_unretire_device()          | Proj. Pvt.	 | Unretire a device
|di_retire_t		       | Proj. Pvt.	 | Libdevinfo retire struct
|			       | 		 |     
|RCM flags		       | 	         |
|=========		       |		 |
|RCM_RETIRE_REQUEST	       | Committed	 | flag RCM request entry points
|RCM_RETIRE_NOTIFY	       | Committed	 | flag RCM notify entry points
|			       |		 |
|snode flags		       | 	         |
|============		       |		 |
|SFENCED	               | Proj. Pvt.	 | snode flag - fenced off
|			       |		 |
|specfs interfaces	       | 	         |
|=================	       |		 |
|spec_fence_snode	       | Proj. Pvt.	 | fence off snode(s)
|spec_unfence_snode	       | Proj. Pvt.	 | unfence snode(s)
|			       |		 |
|I/O retire agent	       |		 |
|================	       |		 |
|io-retire.so		       | Proj. Pvt.	 | I/O retire agent module
|			       |		 |
|fmd I/O retire agent props    | 	         |
|==========================    |		 |
|io-retire.conf		       | Uncommitted	 | I/O retire agent .conf file
|"global-disable"	       | Uncommitted	 | Disable I/O retire
|"fault-exceptions"	       | Proj. Pvt.	 | faults that don't retire
|			       |		 |
|retire store interfaces       | 	         |
|=======================       |		 |
|/etc/devices/retire_store     | Proj. Pvt.	 | Persistent retire store
|			       |		 |
|contract interfaces           | 	         |
|===================           |		 |
|CTT_DEVICE		       | Cons. Pvt.	 | device contract type
|CT_CNACK		       | Cons. Pvt.	 | ctfs ioctl cmd for NACK 
|			       |		 |
|/system/contract/device       | Committed	 | ctfs device contract dir
|			       |		 |
|contract_device_create()      | Proj. Pvt.      | Create a contract post open
|contract_device_open()        | Proj. Pvt.      | Create a contract at open
|contract_device_offline()     | Cons. Pvt.      | Offline contract negotiation
|contract_device_degrade()     | Cons. Pvt.      | Degrade event publish
|			       |		 |
|CT_ACK			       | Cons.Pvt	 | change is permitted
|CT_NACK		       | Cons.Pvt	 | change is not permitted
|CT_NONE		       | Proj. Pvt.      | no contracts
|			       |		 |
|CT_DEV_ST_ONLINE	       | Committed	 | Online state
|CT_DEV_ST_OFFLINE	       | Committed	 | Offline state
|CT_DEV_ST_DEGRADED	       | Committed	 | Degrade state
|			       |		 |
|CTDP_ACCEPT          	       | Committed	 | set of acceptable device states
|CTDP_MINOR          	       | Committed	 | contract minor's devfs path
|CTDP_NONEG          	       | Committed	 | auto NACK a contract break
|			       |		 |
|CTDS_STATE		       | Committed       | state of device
|CTDS_ASET		       | Committed       | A-set (acceptable states set)
|CTDS_MINOR		       | Committed       | device member of contract
|			       |		 |
|ct_dev_tmpl_set_aset()        | Committed	 | set A-set in template
|ct_dev_tmpl_get_aset()        | Committed	 | get A-set in template
|ct_dev_tmpl_set_minor()       | Committed	 | set minor path in template
|ct_dev_tmpl_get_minor()       | Committed	 | get minor path in template
|ct_dev_tmpl_set_noneg()       | Committed	 | set non-neg. term in template
|ct_dev_tmpl_get_noneg()       | Committed	 | get non-neg. term in template
|			       |		 |
|ct_dev_status_get_dev_state() | Committed	 | get device state in contract
|ct_dev_status_get_aset()      | Committed	 | get A-set in contract
|ct_dev_status_get_minor()     | Committed	 | get minor path in contract
|ct_dev_status_get_noneg()     | Committed	 | get the setting for NONEG
|			       |		 |
|ct_ctl_nack()		       | Committed	 | Negative ack for neg. event 
|			       |	         |
| LDI event interfaces	       |		 |
|=====================	       |		 |
| ldi_get_eventcookie()	       |Obsolete committed| Deprecate old interfaces
| ldi_add_event_handler()      |Obsolete committed| Deprecate old interfaces
| ldi_remove_event_handler()   |Obsolete committed| Deprecate old interfaces
| LDI_EV_CB_VERS	       | Committed	 | event callback vector vers.
| LDI_EV_OFFLINE	       | Committed	 | LDI offline event
| LDI_EV_DEGRADE	       | Committed	 | LDI degrade event
| ldi_ev_cookie_t	       | Committed	 | LDI event cookie
| ldi_ev_register_callbacks()  | Committed	 | register notify/finalize 
| ldi_ev_notify()	       | Committed	 | Notify consumers
| ldi_ev_finalize()	       | Committed	 | Finalize events for consumers
| ldi_ev_get_type()	       | Committed	 | Get LDI event name
| ldi_ev_remove_callbacks()    | Committed	 |Remove all LDI event callbacks
| LDI_EV_SUCCESS	       | Committed	 | LDI event success return code
| LDI_EV_FAILURE	       | Committed	 | LDI event failure return code
| LDI_EV_NONE		       | Proj. Pvt.	 | No matching LDI callbacks
================================================================================


X Man pages
===========

Here are man pages for non project-private interfaces

------------------------------------------------------------------------------
NAME
	io-retire.conf - FMA I/O retire agent .conf file

SYNOPSIS
	/usr/lib/fm/fmd/plugins/io-retire.conf

INTERFACE LEVEL
	Stable

DESCRIPTION
	The io-retire.conf file may be used to set pre-defined properties
	which affect the behavior of the I/O retire agent. Currently
	only two properties are defined:

		global-disable:  If set to true, automatic I/O retire will be
		disabled. If set to false (the default), I/O retire will
		be automatic for all supported devices.

		fault-exceptions: May be set to a colon separated list of
		specific fault types which should not trigger a retire.
		This is for the (rare) faults that are known to be
		incorrectly diagnosed.

	Use of the global-disable property is primarily for Sun service
	personnel to diagnose problems. End users should not set this property
	unless told to do so by Sun Service personnel.

	The fault-exceptions property is only for Sun Private use. It will
	not be documented.
	
	 
SEE ALSO
	fmd(1M), fmadm(1M)
------------------------------------------------------------------------------
File Formats                                           device(4)

NAME
     device - device contract type

SYNOPSIS
     /system/contract/device

DESCRIPTION
     Device contracts allow processes to monitor events involving
     a device of interest and to react and/or block state changes
     involving such devices.

     Device contracts are managed  using  the  contract(4)  file
     system  and  the libcontract(3LIB) library. The process con-
     tract type directory is /system/contract/device.

  CREATION
     A device contract may be created in one of two ways:

	a) A process may create and activate a template and then
	invoke open on a minor node of the device. The act of opening
	will create a contract based on the terms in the activated
	template.

	b) A process may create a contract *after* it has opened a device
	by creating a template, setting appropriate terms (including the
	the path to a minor node) on the template and then invoking
	ct_tmpl_create() on the template.

  STATES, BREAKS and EVENTS

	A state refers to the state of the device which is the subject of
	the contract. Currently three states are defined for device contracts
	
     	CT_DEV_ST_ONLINE
		The device is online and functioning normally

	CT_DEV_ST_DEGRADED
		The device is online but functioning in a degraded capacity

	CT_DEV_ST_OFFLINE
		The device is offline and is not configured for use


	A process creates a device contract with the kernel to get a
	guarantee that the device will be in an acceptable set of
	states as long as the contract is valid. This acceptable set
	(or A-set for short) is specified as one of the terms of the
  	contract when the contract is created.
 
	When the devices moves to a state outside the A-set, the contract
	is broken. This breaking of the contract may be either asynchronous
	or synchronous, depending on whether the transition that led to the
	breaking of the contract is synchronous or asynchronous. The figure
	below shows the possible transitions between currently defined
	device states.
		 		<-- A -->	
	 		 /-----------------> DEGRADED
			 |			| 			   
          		 |			| 			   
	  		 |			| S			   
	  		 |			| |			   
          		 |			| v			   
          		 v       S -->		v
			ONLINE ------------> OFFLINE

	The arrows indicate the direction of the state transition. S refers
	to a transition that is synchronous and A refers to a transition
	that is asynchronous.

	If the breaking of a contract is asynchronous, then all that happens
	is that a critical event is generated and sent to the contract holder.
	The event is generated even if the contract holder has not subscribed
	to the event via the critical or informative event sets.

	If the breaking of the contract is synchronous, a critical contract 
	event is generated with the CTE_NEG flag set to indicate that this
	is a negotiation event. The contract holder is expected to either
	ACK this change and allow the state change to occur or it may
	NACK the change to block it (if it has sufficient privileges).

	The term event refers to the transition of a device from one state
	to another. The event is named by the state to which the device
	is transitioning. Thus if a device is transitioning to the OFFLINE
	state, the name of the event is CT_DEV_ST_OFFLINE. An event may have
	no consequence for a contract, or it may result in the asynchronous
	breaking of a contract or it may result in a synchronous (i.e.
	negotiated) breaking of a contract. Events are delivered to a 
	contract holder in three cases

		i)   The contract holder has subscribed to the event via
		     the critical or informative event sets. The event may be
		     either critical or informative in this case depending on
		     the subscription.

		ii)  The device transitions to a state outside the contract's
		     A-set and the transition is asynchonous. This will
		     result in the asynchronous breaking of the contract
		     and a critical event will be delivered to the holder.

		iii) The device transitions to a state outside the contract's
		     A-set and the transition is synchronous. This will
		     result in the synchronous breaking of the contract and
		     a critical event with the CTE_NEG flag set will be
		     delivered to the holder.

	Note that in cases ii) and iii) a critical event will be delivered
	even if the holder has not subscribed to the event via the critical
	or informative event sets.
		    
  NEGOTIATION
	If the breaking of a contract is synchronous, the kernel begins
	negotiations with the contract holder by generating a critical event
	*before* the device changes state. The event will have the CTE_NEG
	flag set indicating that this is a negotiation event. The contract
	owner is allowed a limited period of time in which to either
	ACK the contract event (thus allowing the state change) or if it
	has appropriate privileges, NACK the state change thus blocking
	the state change. ACKs may be sent by the holder via
	ct_ctl_ack(3CONTRACT) and NACKs may be sent via ct_ctl_nack(3CONTRACT).
	If a contract holder does not send either a NACK or ACK within a
	specified period of time, an ACK is assumed and the kernel proceeds
	with the state change.

	Once the device state change is finalized, the contract subsystem
	sends negotiation end (NEGEND) critical messages to the contract owner
	indicating the final disposition of the state transition i.e. either
	success or failure.

	Once a contract is broken, a contract owner may choose to create a
	replacement contract. It may do this after the contract is broken
	or it may choose to do this synchronously with the breaking of the
	old contract via ct_ctl_newct(3CONTRACT).
   
  TERMS
     The following common contract terms, defined in contract(4),
     have device-contract specific attributes:

     informative set

        The default value  for  the  informative  set  is
        CT_DEV_ST_DEGRADE i.e. transitions to the DEGRADED state
	will by default result in informative events. Use
	ct_tmpl_set_informative(3CONTRACT) to set this term.

     critical set

        The  default  value  for  the  critical  set   is
        CT_DEV_ST_OFFLINE i.e. transitions to the OFFLINE state
	will by default result in critical events. Use
	ct_tmpl_set_critical(3CONTRACT) to set this term.

     The following contract terms can be read from or written  to
     a    device    contract    template    using    the   named
     libcontract(3LIB) interfaces.  These contract terms  are  in
     addition to those described in contract(4).

    CTDP_ACCEPT  acceptable set or A-set

	 This term is required for every device contract. It defines the
	 set of device states which the contract owner expects to exist as long
	 as the contract is valid. If a device transitions to a state outside
	 this A-set, then the contract will break and will no longer valid. A
	 critical contract event will be sent to the contract owner to
	 signal this break.

	 Use ct_dev_tmpl_set_aset() to set this term. There is no
	 default A-set. This term is mandatory. Use ct_dev_tmpl_get_aset()
	 to query a template for this term.

     CTDP_MINOR

	Specifies as it's value, the devfs path to a minor that is the subject
	of the contract. Used to specify the minor to be used for creating
	a contract when contract creation takes place other than at open time.

	If the contract is created synchronously at open(2) time, then this
	term is implied to be the minor node being opened. In this case this
	term need not be explicitly be set.

	Use ct_dev_tmpl_set_minor() to set this term. The default setting for
	this term is NULL i.e. no minor is specified.

	Use ct_dev_tmpl_get_minor() to query a contract template for the current
	setting of this term.

   CTDP_NONEG

	This term if set indicates that any negotiable departure from the
	contract terms should be NACKED i.e. the contract subsystem should
	assume a NACK for any negotiated breaking of the contract. This
	term is ignored for asynchronous contract breaks.

	Use ct_dev_tmpl_set_noneg() to set this term. The default setting
	is off.

	Use ct_dev_tmpl_get_noneg() to query a template for the setting of
	this term.

  STATUS
     In addition to the standard items, the  status  object  read
     from  a  status file descriptor contains the following items
     if CTD_FIXED is specified:

    State of device CTDS_STATE
 
 	Returns the current state of the device. Returns one of the following:
 		CT_DEV_ST_ONLINE
 		CT_DEV_ST_DEGRADED
 		CT_DEV_ST_OFFLINE
 	Use ct_dev_status_get_dev_state() to obtain this information.

    A-set of device contract  CTDS_ASET

 	Returns the "acceptable states" (A-set) of the device contract.
	The return value is a bitset of device states and may include one or
	more of the following:

 		CT_DEV_ST_ONLINE
 		CT_DEV_ST_DEGRADED
 		CT_DEV_ST_OFFLINE

 	Use ct_dev_status_get_aset() to obtain this information.

    Setting of noneg flag CTDS_NONEG
 	
 	Returns the current setting of the noneg flag. Returns 1 if the
 	noneg flag is set else 0. Use ct_dev_status_get_noneg() to
 	obtain this information.

     If CTD_ALL is specified, the following items are also avail-
     able:

     Device minor node CTDS_MINOR

         The devfs path of the device which is the subject of the device
         contract.   Use  ct_dev_status_get_minor(3CONTRACT)  to
         obtain this information.

  EVENTS
     No new event related interfaces (beyond the standard contract event
     interfaces) are defined for device contract events.

FILES
     /usr/include/sys/contract/device.h

         Contains definitions of events, status fields and event fields

SEE ALSO
     ctrun(1),  ctstat(1),  ctwatch(1),  open(2),
     ct_tmpl_set_critical(3CONTRACT),
     ct_tmpl_set_informative(3CONTRACT),
     ct_dev_tmpl_set_accept(3CONTRACT),
     ct_dev_tmpl_get_accept(3CONTRACT),
     ct_dev_tmpl_set_minor(3CONTRACT),
     ct_dev_tmpl_get_minor(3CONTRACT),
     ct_dev_tmpl_set_noneg(3CONTRACT),
     ct_dev_tmpl_get_noneg(3CONTRACT),
     ct_dev_status_get_dev_state(3CONTRACT),
     ct_dev_status_get_aset(3CONTRACT),
     ct_dev_status_get_minor(3CONTRACT),
     libcontract(3LIB), contract(4), privileges(5)

------------------------------------------------------------------------------

Contract Management Library Functions	ct_dev_tmpl_set_param(3CONTRACT)

NAME
	ct_dev_tmpl_set_aset, ct_dev_tmpl_get_aset, ct_dev_tmpl_set_minor,
        ct_dev_tmpl_get_minor, ct_dev_tmpl_set_noneg, ct_dev_tmpl_get_noneg
	- device contract template functions
	

SYNOPSIS
	cc [ flag... ] file... -D_LARGEFILE64_SOURCE -lcontract [ library... ]
   	#include <libcontract.h>
	#include <sys/contract/device.h>


	int ct_dev_tmpl_set_aset(int fd, uint_t aset);

	int ct_dev_tmpl_get_aset(int fd, uint_t *asetp);

	int ct_dev_tmpl_set_minor(int fd, char *minor);

	int ct_dev_tmpl_get_minor(int fd, char *buf, size_t buflen);

	int ct_dev_tmpl_set_noneg(int fd);

	int ct_dev_tmpl_get_noneg(int fd, uint_t *negp);

PARAMETERS
	fd
       		A file descriptor from an open of the device contract
		template file in the contract filesystem (ctfs)

	aset
		A bitset of one or more of device states

	asetp
		A pointer to a variable in which the current A-set is to be
		returned.

	minor
		The devfs path (the /devices path without the "/devices"
		prefix) of a minor which is to be the subject of a contract

	buf
		A buffer in which the minor path is to be returned.

	buflen
		Size of buffer buf.

	negp
		A pointer to a uint_t variable for receiving the current
		setting of the "non-negotable" term in the template

DESCRIPTION
	These functions read and write device contract terms and operate on
	device contract template file descriptors obtained from the
	contract(4) i.e. ctfs filesystem.

	The ct_dev_tmpl_set_aset() and ct_dev_tmpl_get_aset() functions
	write and read the "acceptable states" set (or A-set for short).
	This is the set of device states guaranteed by the contract. Any
	departure from these states will result in the breaking of the
	contract and a delivery of a critical contract event to the
	contract holder. The A-set value is a bitset of one or more of the
	following device states.
	
			CT_DEV_ST_ONLINE
			CT_DEV_ST_DEGRADED
			CT_DEV_ST_OFFLINE

	The ct_dev_tmpl_set_minor() and ct_dev_tmpl_get_minor() functions
	write and read the minor term i.e. the device resource that is to be
	the subject of the contract. The value is a devfs path to a device
	minor node.

	The ct_dev_tmpl_set_noneg() and ct_dev_tmpl_get_noneg() functions
	write and read the non-negotiable term. If this term is set,
	synchronous negotiation events are automatically NACKed on behalf of
	the contract holder. For ct_dev_tmpl_get_noneg(), the variable pointed
	to by negp is set to 1 if the "noneg" term is set or to 0 otherwise.
	
RETURN VALUES
	Upon successful completion, these functions return 0. Otherwise,
	they return a non-zero error value.

ERRORS
	The ct_dev_tmpl_set_aset() function will fail if:
	EINVAL		Invalid template file descriptor or A-set

	The ct_dev_tmpl_set_minor() function will fail if:

	EINVAL		Invalid argument(s)

	ENXIO		The minor named by minor path does not exist 

	The ct_dev_tmpl_set_noneg() function will fail if:

  	EPERM		Process lacks sufficient privilege to NACK a
			device state change.
  			
	The ct_dev_tmpl_get_aset(), ct_dev_tmpl_get_minor() and
	ct_dev_tmpl_get_noneg() functions will fail if:

	EINVAL		Invalid arguments specified

	ENOENT		Requested term is not set

INTERFACE LEVEL
	Committed

SEE ALSO
	libcontract(3LIB), contract(4), device(4), lfcompile(5)

------------------------------------------------------------------------------
Contract Management Library Functions     ct_ctl_adopt(3CONTRACT)

NAME
     ct_ctl_adopt,  ct_ctl_abandon,   ct_ctl_newct,   ct_ctl_ack,
 |   ct_ctl_nack(), ct_ctl_qack - common contract control functions

SYNOPSIS
     cc [ flag... ] file... -D_LARGEFILE64_SOURCE -lcontract [ library... ]
     #include <libcontract.h>

     int ct_ctl_adopt(int fd);

     int ct_ctl_abandon(int fd);

     int ct_ctl_newct(int fd, uint64_t evid, int templatefd);

     int ct_ctl_ack(int fd, uint64_t evid);

  |  int ct_ctl_nack(int fd, uint64_t evid);

     int ct_ctl_qack(int fd, uint64_t evid);

DESCRIPTION
     These functions operate on contract control file descriptors
     obtained from the contract(4) file system.

     The ct_ctl_adopt() function adopts the  contract  referenced
     by  the  file  descriptor  fd.  After  a  successful call to
     ct_ctl_adopt(), the contract is owned by the calling process
     and  any  events in that contract's event queue are appended
     to the process's bundle of the appropriate type.

     The ct_ctl_abandon() function abandons the  contract  refer-
     enced  by the file descriptor fd. After a successful call to
     ct_ctl_abandon() the process no longer  owns  the  contract,
     any  events  sent by that contract are automatically removed
     from the process's bundle, and any critical  events  on  the
     contract's   event  queue  are  automatically  acknowledged.
     Depending on its type and terms, the contract will either be
     orphaned or destroyed.

     The ct_ctl_ack() function acknowledges  the  critical  event
     specified  by evid. If the event corresponds to an exit nego-
     tiation, ct_ctl_ack() also  indicates  that  the  caller  is 
     prepared for  the  system  to  proceed  with the referenced 
     reconfiguration.

  |  The ct_ctl_nack() function acknowledges the critical negotiation 
  |  event specified by evid.  ct_ctl_nack() also indicates that the
  |  caller wishes to block the proposed reconfiguration indic-
  |  ated by event evid. Depending on the contract type, this function
  |  may require certain privileges to be asserted in the process'
  |  effective set. This function will fail and return an error
  |  if the event represented by evid is not a negotiation event.

     The ct_ctl_qack() function requests a new  quantum  of  time
     for the negotiation specified by the event ID evid.

     The ct_ctl_newct() function instructs the contract specified
     by  the  file descriptor fd that when the current exit nego-
     tiation completes, another contract with the terms  provided
     by  the template specified by templatefd should be automati-
     cally written.

RETURN VALUES
     Upon      successful       completion,       ct_ctl_adopt(),
  |  ct_ctl_abandon(), ct_ctl_newct(), ct_ctl_ack(), ct_ctl_nack(),
     and ct_ctl_qack() return 0. Otherwise, they return a non-zero
     error value.

ERRORS
 The ct_ctl_adopt() function will fail if:

     EBUSY           The contract is in the owned state.

     EINVAL          The  contract  was  not  inherited  by   the
                     caller's  process contract or was created by
                     a process in a different zone.

  |  The ct_ctl_abandon(), ct_ctl_newct(), ct_ctl_ack(), ct_ctl_nack()
     and ct_ctl_qack() functions will fail if:

     EBUSY           The contract does not belong to the  calling
                     process.

     The ct_ctl_newct() and ct_ctl_qack() functions will fail if:

     ESRCH           The event ID  specified  by  evid  does  not
                     correspond  to an unacknowledged negotiation
                     event.

     The ct_ctl_newct() function will fail if:

     EINVAL          The file descriptor specified by fd was  not
                     a valid template file descriptor.

  | The ct_ctl_ack() and ct_ctl_nack() function will fail if:

     ESRCH           The event ID  specified  by  evid  does  not
  |                  correspond  to  an  unacknowledged negotiation 
                     event.

  |  The ct_ctl_nack() function will fail if:
  | 
  |  EPERM           The calling process lacks the appropriate
  |                  privileges required to block the reconfiguration

     The ct_ctl_qack() function will fail if:

     ERANGE          The  maximum  amount  of   time   has   been
                     requested.

ATTRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:
     ____________________________________________________________
    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    |_____________________________|_____________________________|
    | Interface Stability         | Evolving                    |
    |_____________________________|_____________________________|
    | MT-Level                    | Safe                        |
    |_____________________________|_____________________________|

SEE ALSO
     libcontract(3LIB), contract(4), attributes(5), lfcompile(5)
------------------------------------------------------------------------------

Contract Management Library Functions	ct_dev_status_get_minor(3CONTRACT)

NAME
   	ct_dev_status_get_dev_state, ct_dev_status_get_aset,
   	ct_dev_status_get_minor, ct_dev_status_get_noneg

SYNOPSIS
	cc [ flag... ] file... -D_LARGEFILE64_SOURCE -lcontract [ library... ]
   	#include <libcontract.h>
	#include <sys/contract/device.h>

  	int ct_dev_status_get_dev_state(ct_stathdl_t stathdl, uint_t *statep);

  	int ct_dev_status_get_aset(ct_stathdl_t stathdl, uint_t *asetp);

	int ct_dev_status_get_minor(ct_stathdl_t stathdl, char *buf,
	    size_t buflen);

     	int ct_dev_status_get_noneg(ct_stathdl_t stathdl, uint_t *nonegp);
   
PARAMETERS
	stathdl
       		A status object returned by ct_status_read(3CONTRACT)

  	statep
  		A pointer to a uint_t variable for receiving the current
  		state of the device which is the subject of the contract

	asetp
  		A pointer to a uint_t variable for receiving the acceptable
  		state set (i.e. A-set) for the contract

	buf
		A buffer for receiving the devfs path of a minor in a
		contract.

	buflen
		Size of buf

  	nonegp
  		A pointer to a uint_t variable for receiving the setting
  		of the "noneg" term.

DESCRIPTION
	These functions read contract status information from a status
	object (stathdl) returned by ct_status_read(3CONTRACT).

	The ct_dev_status_get_dev_state() function returns the current
	state of the device which is the subject of the contract. This can
	currently be one of the following:

		CT_DEV_ST_ONLINE - the device is online and funtioning normally
		CT_DEV_ST_DEGRADED - the device is online but degraded
		CT_DEV_ST_OFFLINE - the device is offline and not configured

	The ct_dev_status_get_aset() function returns the A-set
	of the contract. This can currently be the bitset of one
	or more of the following states:

		CT_DEV_ST_ONLINE
		CT_DEV_ST_DEGRADED
		CT_DEV_ST_OFFLINE

	The ct_dev_status_get_minor() function reads the devfs path
	of the minor participating in the contract.

  	The ct_dev_status_get_noneg() function returns the "noneg"
  	setting for the contract. A 1 is returned via the nonegp
  	argument if "NONEG" is set, else 0 is returned.

RETURN VALUES
	Upon successful completion, these functions return 0. Otherwise,
	they return a non-zero error value.

ERRORS
	The ct_dev_status_get_minor() function will fail if:

	EOVERFLOW	The buffer size is too small to hold the result

  	The ct_dev_status_get_dev_state(), ct_dev_status_get_aset(),
    	ct_dev_status_get_minor() and ct_dev_status_get_noneg() functions
   	will fail if:

	EINVAL		Invalid argument(s) specified

	ENOENT		The requested data is not present in the status object.

INTERFACE LEVEL
	Committed
SEE ALSO
	ct_status_read(3CONTRACT), ct_status_free(3CONTRACT),
	libcontract(3LIB), contract(4), device(4), lfcompile(5)
------------------------------------------------------------------------------
Kernel Functions for Drivers			ldi_ev_get_cookie(9F)

NAME
       ldi_ev_get_cookie - Get an LDI event cookie for a specified event

SYNOPSIS
	#include <sys/sunldi.h>

	int ldi_ev_get_cookie(ldi_handle_t lh, char *evname,
	    ldi_ev_cookie_t *cookiep);

INTERFACE LEVEL
	Solaris DDI specific (Solaris DDI).

PARAMETERS
	ldi_handle_t lh
		A layered handle representing the device
		for which the event notification was requested.

	char *evname
		The string name of the event.

	ldi_ev_cookie_t *cookiep
		A pointer of type ldi_ev_cookie_t. Contains a pointer
		to the event cookie on return.

DESCRIPTION
	The ldi_ev_get_cookie() function accepts the string name
	of a state change event affecting the device represented by
	the layered driver handle "lh" and returns an opaque cookie
	on success. The call will be successful if the framework
	supports event notification for the event named by evname. If
	successful, the function will return an opaque cookie through
	the cookiep parameter. The cookie is required in subsequent calls
	for registering callbacks on events.

  	Two LDI events are currently defined:
  		LDI_EV_OFFLINE   The device is moving to the offlined state
  		LDI_EV_DEGRADE   The device is moving to the degraded state.

RETURN VALUES

	LDI_EV_SUCCESS
		The event cookie was created successfully.

	LDI_EV_FAILURE
		An error occurred and the cookie was not created.

CONTEXT
	The ldi_ev_get_cookie() function can be  called  from
	user and kernel contexts only.

SEE ALSO
	ldi_ev_register_callbacks(9F), ldi_ev_remove_callbacks(9F)


------------------------------------------------------------------------------
Kernel Functions for Drivers			ldi_ev_register_callbacks(9F)

NAME
        ldi_ev_register_callbacks - add a notify and/or finalize callback

SYNOPSIS
	#include <sys/sunldi.h>

  	int ldi_ev_register_callbacks(ldi_handle_t lh, ldi_ev_cookie_t cookie,
  		ldi_ev_callback_t *callb, void *arg, ldi_ev_callback_id_t *id);

INTERFACE LEVEL
	Solaris DDI specific (Solaris DDI).

PARAMETERS
	ldi_handle_t lh
		A layered handle representing the device
		for which the event notification was requested.

	ldi_ev_cookie_t cookie
		An opaque event cookie for the event type returned
		by a previous call to ldi_ev_get_cookie(9F)

  	ldi_ev_callback_t *callb:
  		A data structure which currently has the following members:
  
  			struct ldi_ev_callback {
  				uint_t  cb_vers;
  				int 	(*cb_notify)(ldi_handle_t,
  						     ldi_ev_cookie_t cookie,
  						     void *arg, void *ev_data);
   				void 	(*cb_finalize)(ldi_handle_t,
  						      ldi_ev_cookie_t cookie,
  						      int ldi_result,
  						      void *arg,
  						      void *ev_data);
  			} ldi_ev_callback_t;
  
  		where
  			cb_vers
  				Version of callback vector. Must be set to
  				LDI_EV_CB_VERS by the caller.

  			The arguments passed into the callbacks when they are
  			invoked, include:
  
  			int ldi_result
  				The actual result of the state change
  				operation/event passed to finalize callback:
  					LDI_EV_SUCCESS: State change succeeded
  					LDI_EV_FAILURE:	The state change failed
  			void *ev_data
  				Event specific data.
	void *arg
		A pointer to opaque caller private data

        ldi_ev_callback_id_t *id
		Unique system wide registration id returned by
  		ldi_ev_register_callbacks(9F) upon successful registration.


DESCRIPTION
	The ldi_ev_register_callbacks() interface allows layered
	drivers to register notify and finalize callbacks for
	certain events. These events are listed in the
	ldi_ev_get_cookie(9F) man page. The notify callback is
	invoked only for events that can be blocked, just before the
	event occurs. Layered drivers that have registered notify
	callbacks for that event have the opportunity of blocking
	such events. The finalize callback is invoked once the final
	disposition of the state of a device (specifically a device minor
	node) is known. The callback is invoked with this result, either
	LDI_EV_SUCCESS (state change succeeded) or LDI_EV_FAILURE
	(state change failed). This allows layered driver consumers
	to finalize any changes they made in response to a previous
	"notify" callback.
  	
	For example, a layered driver's notify callback may be invoked
	in response to a LDI_EV_OFFLINE event. The layered driver may
	reconfigure itself to stop using the device and permit the change
	to go forward. Once that happens, the I/O framework will attempt
	to actually offline the device. This offline attempt can have two
	possible outcomes: success or failure. In the former case, the finalize
	callback will be invoked with the ldi_result argument set to
	LDI_EV_SUCCESS and the layered driver will know that the device has
	been offlined. In the latter case finalize is invoked with the
	ldi_result set to LDI_EV_FAILURE and the layered driver knows that
	the state change failed - in which case it may choose to reconfigure
	itself to start using the device again.

	Finalize callbacks can be registered for all events including events
	that cannot be blocked.
	
	A layered driver can also propagate these events up the software
	stack by using interfaces offered by the LDI event framework.
	The layered driver may use ldi_ev_notify() to propagate notify
	events occurring on minors it imports onto minors it exports.
	Similarly, it may use ldi_ev_finalize() to propagate finalize
	events. Both ldi_ev_notify() and ldi_ev_finalize() will
	propagate events to device contracts as well as LDI callbacks
	registered against the exported minor nodes.

	The LDI event framework has the following guarantees and
	and requirements with respect to these callbacks:

	1. The notify() callback is invoked before an event (represented
	   by the event cookie) occurs on a device (represented by the
	   layered driver handle) and is invoked only for for events that
	   can be blocked. If the callback returns LDI_EV_FAILURE, the
	   event will be blocked. If the callback returns LDI_EV_SUCCESS,
	   the event will be allowed to proceed. If any other value is
	   returned, it is an error - an error message will be logged
	   and the event will be blocked. An example of an event that
	   can be blocked and for which notify callbacks may be invoked
	   is the offline event LDI_EV_OFFLINE.

	2. The finalize callback is invoked for all events (including 
	   events that cannot be blocked) after the event has occurred.
	   It will be invoked with either LDI_EV_SUCCESS indicating that
	   the event successfully happened or LDI_EV_FAILURE indicating
	   that the event did not occur. The finalize callback returns
	   no values. A good example of an event that cannot be
	   blocked is the degrade event (LDI_EV_DEGRADE).
 
	3. Layered drivers may register one or both of these callbacks
	   (i.e. only for a notify event or only for a finalize event
	   or for both) against any LDI handle that they may possess.
	   If a finalize or notify event is not being registered, the
	   corresponding pointer in the ldi_ev_callback_t structure
	   must be set to NULL. It is an error to attempt a registration
	   with both callbacks set to NULL.

	4. A notify and/or finalize callback will be invoked only
	   if the corresponding LDI handle is open. If an LDI handle
	   against which the callbacks are registered is closed
	   the corresponding finalize and notify callbacks will
	   not be invoked as it is assumed that the layered driver
	   is no longer interested in the device.. There *is*
	   however an exception to this rule. See 5 below.

	5. A layered driver that closes it's LDI handle in it's
	   notify routine *will* receive the corresponding
	   finalize callback after the event has occurred.
	   Because the LDI handle has been closed, the finalize
	   callback will be invoked with a NULL LDI handle. It is
	   the responsibility of the layered driver to maintain
	   state in it's private "arg" parameter so that it can
	   reopen the device (if desired) in it's finalize callback.

	   One example where this may happen is with the LDI_EV_OFFLINE
	   event. A layered driver's notify callback may be invoked
	   for an offline event. The layered driver may choose to allow
	   this event to proceed. In that case, since it has a layered
	   open of the device, it *must* close the LDI handle so that
	   offline can succeed (an offline of a device will not succeed
	   if there is *any* open of the device, layered or otherwise).
	   Since the layered driver has closed the LDI handle in the
	   notify routine, it's finalize callback (if any) will be
	   invoked with a NULL LDI handle. It is the responsibility of
	   the layered driver to maintain state (such as the device path
	   or devid) in it's private "arg" parameter, so that in the
	   finalize routine, it can do a layered open of the device if the
	   device offline failed.

	   The above is the *only* exception where the finalize callback
	   is invoked if the LDI handle has been closed. In all other cases
	   if the LDI handle has been closed, no corresponding callbacks will
	   be invoked.

	6. For the LDI_EV_OFFLINE event, for the offline to succeed, it is
	   imperative that there be no opens (including LDI handles) to the
	   device. If a layered driver's notify callback is invoked for an
	   offline event and the driver intends to allow the offline to
	   proceed, the driver *must* close the corresponding LDI handle.
	

	7. The notify and finalize callbacks are not automatically
	   deregistered even if the corresponding LDI handle has been closed.
	   It is the responsibility of the layered driver to deregister
	   these callbacks when they are not required. It may do so using the
	   ldi_ev_remove_callbacks(9F) interface. The LDI framework may panic
	   if the entity registering the callback (such as a dip, dev_t or
	   module) no longer exists on the system and the corresponding
	   callbacks have not been unregistered.

  	
	8. The LDI event framework guarantees that if a layered driver
	   receives a notify event, it will also receive a finalize
	   event except if the layered consumer itself blocked the
	   event i.e. it returned LDI_EV_FAILURE from it's notify
	   callback. In the latter case, the layered driver knows
	   that the event has been blocked and therefore does not
	   need the finalize callback.

	9. If a layered driver propagates notify events on minors
	   it imports to minors it exports, it *must* first propagate
	   these events up the software stack via ldi_eve_notify() in it's
	   notify callback. It must do so before attempting to check if it
	   block the event. This is required, because a layered driver
	   cannot release the device if consumers up the stack are still
	   using the device. If ldi_ev_notify(), returns LDI_EV_FAILURE,
	   the callback must immediately return LDI_EV_FAILURE from it's
	   notify callback. If ldi_ev_notify() returns LDI_EV_SUCCESS,
	   then the state change is permissible as far as consumers higher
	   up in the software stack are concerned. The layered driver
	   must then determine if it can permit the state change. If the
	   state change is to be allowed, the layered driver must return
	   LDI_EV_SUCCESS. If the layered driver determines that the
	   state change should not be permitted, it *must* invoke
	   ldi_ev_finalize() on minors it exports with a result of
	   LDI_EV_FAILURE (to inform consumers up the stack) and then
	   return LDI_EV_FAILURE from it's notify callback.

	10. The LDI event framework generates finalize events at the
	   earliest point where a failure is detected. If the failure
	   is detected in the framework (such as in ldi_ev_notify())
	   the framework will generate the finalize events. In the
	   event that a failure is first detected in a layered
	   driver i.e. in the notify callback of a layered driver,
	   the layered driver must use ldi_ev_finalize() to send finalize
	   events up the software stack . See EXAMPLES for code
	   snippets describing this scenario.

	11. The finalize callback *must* first reconfigure itself before
	   attempting to propagate the event up the software stack via
	   ldi_ev_finalize(9F).  This is so that the minors it exports are
	   available and ready for use before the finalize event is propagated
	   up the software stack.

	12. It may so happen that the event propagated up the software
	    stack is not the same as the event for which a layered driver's
	    notify/finalize callback is invoked. For example, a layered driver's
	    callback(s) may be invoked for an offline event, but the driver may
	    choose to only propagate the degraded event to *its* consumers
	    (since it may have a mirror/copy of the data on the device.)
	    In that case, the layered driver *must* generate a different
	    event cookie i.e. one corresponding to the degraded event via
	    ldi_ev_get_cookie(9F) and use that cookie in its propagation
	    calls i.e. ldi_ev_notify(9F) and ldi_ev_finalize(9F).
	
  	Once the registration of the callback(s) is successful, an
  	opaque ldi_ev_callback_id_t structure is returned which may be used
  	to unregister the callback(s) later.

RETURN VALUES
	LDI_EV_SUCCESS
		Callback(s) added successfully.

	LDI_EV_FAILURE
		Failed to add callback(s)


CONTEXT
	The ldi_ev_register_callbacks() function can be  called  from
	user and kernel contexts only.

EXAMPLES

Example I Here is a typical registration and callbacks for the OFFLINE event	

static int
event_register(void)
{
	ldi_handle_t lh;
	ldi_ev_callback_t callb;
	ldi_ev_cookie_t off_cookie;

	if (ldi_ev_get_cookie(lh, LDI_EV_OFFLINE, &off_cookie)
	    == LDI_EV_FAILURE)
		goto fail;

	
	callb.cb_vers = LDI_EV_CB_VERS;
	callb.cb_notify = off_notify;
	callb.cb_finalize = off_finalize;

	if (ldi_ev_register_callbacks(lh, off_cookie, &callb, arg, &id)
	    != LDI_EV_SUCCESS)
		goto fail;
}

static void
event_unregister(ldi_ev_callback_id_t id)
{
	ldi_ev_remove_callbacks(id);
}

static int
off_notify(ldi_handle_t lh, ldi_ev_cookie_t off_cookie, void *arg,
    void *ev_data)
{

	ASSERT(strcmp(ldi_ev_get_type(off_cookie), LDI_EV_OFFLINE) == 0);

	/* Map imported minors to exported minor */
	widget_map(lh, &minor, &spec_type);

	/*
	 * Call ldi_ev_notify() to propagate events to our consumers.
	 * This *must* happen before we check if offline should be blocked
	 */
	if (ldi_ev_notify(dip, minor, spec_type, off_cookie, ev_data)
	    != LDI_EV_SUCCESS)
		return (LDI_EV_FAILURE);

	/*
	 * Next, check if we can allow the offline
	 */
	if (widget_check(lh) == WIDGET_SUCCESS) {
		widget_save_path(arg, lh);
		widget_reconfigure(lh, RELEASE);
		ldi_close(lh);
		return (LDI_EV_SUCCESS)
	}

	/*
	 * We cannot permit the offline. The first layer that detects
	 * failure i.e. us, must generate finalize events for our consumers
	 */
	ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE, off_cookie,
	    ev_data);

	return (LDI_EV_FAILURE);
}

/*
 * The finalize callback will only be called if we returned LDI_EV_SUCCESS
 * in our notify callback. ldi_result passed in may be SUCCESS or FAILURE
 */
static void
off_finalize(ldi_handle_t NULL_lh, ldi_ev_cookie_t off_cookie, int ldi_result,
    void *arg, void *ev_data)
{
	ldi_handle_t lh;

	ASSERT(strcmp(ldi_ev_get_type(off_cookie), LDI_EV_OFFLINE) == 0);

	path = widget_get_path(arg);

	widget_map_by_path(path, &minor, &spec_type);

	if (ldi_result == LDI_EV_SUCCESS) {
		ldi_ev_finalize(dip, minor, spec_type, LDI_EV_SUCCESS,
		    off_cookie, ev_data);
		return;
	}

	/* The offline failed. Reopen the device */
	ldi_open_by_name(path, &lh);
	widget_reconfigure(lh, REACQUIRE);

	ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE, off_cookie,
	    ev_data);
}

Example II Here is a typical registration and callbacks for the DEGRADE event	

static int
event_register(void)
{
	ldi_handle_t lh;
	ldi_ev_callback_t callb;
	ldi_ev_cookie_t dgrd_cookie;

	if (ldi_ev_get_cookie(lh, LDI_EV_DEGRADE, &dgrd_cookie)
	    == LDI_EV_FAILURE)
		goto fail;

	/* no notify callbacks allowed for degrade events */
	callb.cb_vers = LDI_EV_CB_VERS;
	callb.cb_notify = NULL;	/* NULL, notify cannot be used for DEGRADE */
	callb.cb_finalize = dgrd_finalize;

	if (ldi_ev_register_callbacks(lh, dgrd_cookie, &callb, arg, &id)
	    != LDI_EV_SUCCESS)
		goto fail;
}

static void
event_unregister(ldi_ev_callback_id_t id)
{
	ldi_ev_remove_callbacks(id);
}

/*
 * For degrade events. ldi_result will always be LDI_EV_SUCCESS
 */
static void
dgrd_finalize(ldi_handle_t lh, ldi_ev_cookie_t off_cookie, int ldi_result,
    void *arg, void *ev_data)
{
	ASSERT(ldi_result == LDI_EV_SUCCESS);
	ASSERT(strcmp(ldi_ev_get_type(off_cookie), LDI_EV_DEGRADE) == 0);

	widget_map(lh, &minor, &spec_type);

	widget_reconfigure(lh, RELEASE);

	ldi_ev_finalize(dip, minor, spec_type, LDI_EV_SUCCESS, dgrd_cookie,
	    ev_data);
}

SEE ALSO
	ldi_ev_get_cookie(9F), ldi_ev_notify(), ldi_ev_finalize(),
	ldi_ev_remove_callbacks(9F)

------------------------------------------------------------------------------
Kernel Functions for Drivers			ldi_ev_notify(9F)

NAME
       ldi_ev_notify - propagate notification of a state change event

SYNOPSIS
	#include <sys/sunldi.h>

  	int ldi_ev_notify(dev_info_t *dip, minor_t minor, int spec_type,
	    ldi_ev_cookie_t cookie, void *ev_data);
	    

INTERFACE LEVEL
	Solaris DDI specific (Solaris DDI).

PARAMETERS
  	dev_info_t *dip
  		The devinfo node of the layered consumer exporting the minor

  	minor_t minor
  		The minor number of the exported minor

	int spec_type
		The type of minor (S_IFCHR or S_IFBLK)

	ldi_ev_cookie_t cookie
		An opaque event cookie for the event type returned
		by a previous call to ldi_ev_get_cookie(9F)

	void *ev_data
		Event specific data

DESCRIPTION
	The ldi_ev_notify() function propagates an event up the software
	stack. It may result in two actions:

		1. Invocation of LDI callback handlers registered by layered
		drivers up the software stack

		2. Device contract events generated on minors exported to
		userland

	Note that the event propagated up the software stack may be
	different from the event received by the layered driver invoking
	ldi_ev_notify(). For example, a volume manager may receive
	an "offline" event on one of it's LDI opened disks, but may choose
	to propagate a "degraded" event on minors it exports to userland
	(since it may have more than one copy of the data)
	The event cookie argument to ldi_ev_notify() may thus be
	different from the event cookie currently possessed by the layered
	driver. If that is the case, the layered driver must generate
	another event cookie via a new ldi_ev_get_cookie() call.

  	The ldi_ev_* interfaces are designed to ensure that a "finalize"
  	call is generated for layered driver consumers at the earliest
	point where an LDI_EV_FAILURE is detected. If this happens inside
	the LDI event framework, then the framework will invoke finalize.
	In the event a layered driver detects/generates an LDI_EV_FAILURE,
	then the layered driver must invoke ldi_ev_finalize(). Here is an
	example of a layered driver invoking ldi_ev_finalize() for the
	"foo" event:
	
  
	static int
	widget_notify(ldi_handle_t lh, ldi_ev_cookie_t foo_cookie, void *arg,
	    void *ev_data)
	{

		ASSERT(strcmp(ldi_ev_get_type(foo_cookie), LDI_EV_FOO) == 0);

		/* Map imported minors to exported minor */
		widget_map(lh, &minor, &spec_type);

		/*
		 * Call ldi_ev_notify() to propagate events to our consumers.
		 * This *must* happen before we check if widget should block
		 * foo
		 */
		if (ldi_ev_notify(dip, minor, spec_type, foo_cookie, ev_data)
		    != LDI_EV_SUCCESS)
			return (LDI_EV_FAILURE);

		/*
		 * Next, check if we can allow the foo event
		 */
		if (widget_release(lh, LDI_EV_FOO) == WIDGET_SUCCESS) {
			return (LDI_EV_SUCCESS)
		}

		/*
		 * We cannot permit the foo event. The first layer that detects
	 	 * failure i.e. us, must generate finalize events for *our*
		 * consumers
	 	 */
		ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE,
		    foo_cookie, ev_data);

		return (LDI_EV_FAILURE);
}

RETURN VALUES
	LDI_EV_SUCCESS
		Consumers up the software stack permit state change

	LDI_EV_FAILURE
		Consumers are blocking the state change

CONTEXT
	The ldi_ev_notify() function can be  called  from user and kernel
	contexts only.

SEE ALSO
	ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F),
	ldi_ev_remove_callbacks(9F)

-------------------------------------------------------------------------------
  	  
Kernel Functions for Drivers			ldi_ev_finalize(9F)

NAME
       ldi_ev_finalize - propagate disposition of a state change event

SYNOPSIS
	#include <sys/sunldi.h>

  	void ldi_ev_finalize(dev_info_t *dip, minor_t minor, int spec_type,
  	    int ldi_result, ldi_ev_cookie_t cookie, void *ev_data);

INTERFACE LEVEL
	Solaris DDI specific (Solaris DDI).

PARAMETERS
  	dev_info_t *dip
  		The devinfo node of the layered consumer exporting the minor
  
  	minor_t minor
  		The minor number of the exported minor

	int spec_type
		The type of minor (S_IFCHR or S_IFBLK)

	int ldi_result
		The final disposition of the state change 

	ldi_ev_cookie_t cookie
		An opaque event cookie for the event type returned
		by a previous call to ldi_ev_get_cookie(9F)

	void *ev_data
		Event specific data

DESCRIPTION
	The ldi_ev_finalize() function propagates the final disposition
	of an event up the software stack. It may result in two actions:

		1. Invocation of "finalize" LDI callback handlers registered by
		layered drivers up the software stack
		
		2. Device contract "negotiation end" (CT_EV_NEGEND) events
		generated on minors exported to userland

	Note that the event propagated up the software stack may be
	different than the event received by the layered driver invoking
	ldi_ev_finalize(). For example, a volume manager may receive
	an "offline" event on one of it's LDI opened disks, but may choose
	to propagate a "degraded" event on minors it exports to userland.
	The event cookie argument to ldi_ev_notify() may thus be
	different from the event cookie currently possessed by the layered
	driver. If that is the case, the layered driver must generate
	another event cookie via a new ldi_ev_get_cookie() call.

RETURN VALUES
	None
CONTEXT
	The ldi_ev_finalize() function can be  called  from user and kernel
	contexts only.

EXAMPLE
	Invoking ldi_ev_finalize(9F) from widget's finalize callback

	static void
	widget_finalize(ldi_handle_t lh, ldi_ev_cookie_t foo_cookie,
	    int ldi_result, void *arg, void *ev_data) 
	    
	{
		ASSERT(strcmp(ldi_ev_get_type(foo_cookie), LDI_EV_FOO) == 0);

		/* Map imported minor to exported minors */
		widget_map(lh, &minor, &spec_type);

		if (ldi_result == LDI_EV_SUCCESS) {
			ldi_ev_finalize(dip, minor, spec_type,
			    LDI_EV_SUCCESS, foo_cookie, ev_data);
		}

		/*
		 * The event foo failed. Reconfigure yourself
		 * *before* propagating
		 */
		widget_reconfigure(lh, LDI_EV_FOO, REACQUIRE);

		ldi_ev_finalize(dip, minor, spec_type, LDI_EV_FAILURE,
		    foo_cookie, ev_data);
	}

SEE ALSO
	ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F),
	ldi_ev_remove_callbacks(9F)

--------------------------------------------------------------------------------
Kernel Functions for Drivers			ldi_ev_get_type(9F)

NAME
       ldi_ev_get_type - Get event name string from event cookie

SYNOPSIS
	#include <sys/sunldi.h>

	char *ldi_ev_get_type(ldi_ev_cookie_t cookie);

INTERFACE LEVEL
	Solaris DDI specific (Solaris DDI).

PARAMETERS
	ldi_ev_cookie_t cookie
		An opaque event cookie for the event type returned
		by a previous call to ldi_ev_get_cookie(9F)

DESCRIPTION
	The ldi_ev_get_type() function returns the event string
	represented by the LDI event cookie "cookie".
RETURN VALUES
	On success returns the event string represented by cookie, else
	returns NULL.

CONTEXT
	The ldi_ev_get_type() function can be  called  from user and kernel
	contexts only.

SEE ALSO
	ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F),
	ldi_ev_remove_callbacks(9F)
--------------------------------------------------------------------------------
Kernel Functions for Drivers			ldi_ev_remove_callbacks(9F)

NAME
       ldi_ev_remove_callbacks - Remove all callbacks for a given callback ID

SYNOPSIS
	#include <sys/sunldi.h>

	void ldi_ev_remove_callbacks(ldi_ev_callback_id_t id);
        
INTERFACE LEVEL
	Solaris DDI specific (Solaris DDI).

PARAMETERS
	ldi_ev_callback_id_t id
		An opaque data structure returned on successful calls
		to ldi_ev_register_callbacks(9F)

DESCRIPTION
	The ldi_ev_remove_callback() function unregisters any callbacks
	registered via ldi_ev_register_callbacks(9F). Once this function
	returns, the callback ID is no longer valid.

  	Note that the finalize and notify callback exist independently 
  	of the LDI handle and are not automatically removed when the 
  	LDI handle is closed. It is up to the layered driver to remove these
  	callbacks via ldi_ev_remove_callbacks() when the callbacks are no
    	longer needed. The LDI framework may panic the system if the 
  	entity registering the callback (a dev_t, dip or module) no longer
	exists on the system and the callbacks have not been unregistered.

RETURN VALUES
	None

CONTEXT
	The ldi_ev_remove_callbacks() function can be  called  from user
	and kernel contexts only.

SEE ALSO
	ldi_ev_get_cookie(9F), ldi_ev_register_callbacks(9F)

------------------------------------------------------------------------------
NAME    request_offline, notify_online, notify_remove

SYNOPSIS
        #include <librcm.h>

        int prefixrequest_offline(rcm_handle_t *handle, char *rsrcname,
            pid_t pid, uint_t flag, char **reason, rcm_info_t **dependent_info);

        int prefixnotify_online(rcm_handle_t *handle, char *rsrcname,
            pid_t pid, uint_t flag, rcm_info_t **dependent_info);              

        int prefixnotify_remove(rcm_handle_t *handle, char *rsrcname,
            pid_t pid, uint_t flag, rcm_info_t **dependent_info);              

ARGUMENTS

        handle  handle provided by RCM daemon

        rsrcname name of resource

        pid     process pid further identifies DR client                       

        flag    prefixrequest_offline() may contain the following bit field.

                RCM_QUERY       check if resource can be offlined; do not      
                                perform operation, exclusive with RCM_FORCE.   
                RCM_FORCE       request is urgent

  |		prefixrequest_offline may also contain the following flag
  |
  |  		RCM_RETIRE_REQUEST
  |				called in the context of I/O retire. Apply
  |				constraints to ensure that only non-critical
  |				devices can be offlined. If non-critical,
  |				release the resource (i.e. the device) so
  |				that retire can be successful.
  | 
  |		prefixnotify_online/remove may also contain the following flag
  | 
  | 		RCM_RETIRE_NOTIFY
  |				Perform any I/O retire related cleanup
  |				actions required in the online or remove
  |				entry points.
  |
        reason  pointer to string describing reason of refusal

        dependent_info
                info related to dependent resources

DESCRIPTION

        prefixrequest_offline() is invoked when a request comes in to offline
        the resource. The module may refuse to release the resource by returning
        RCM_FAILURE and updating reason to point to a dynamically allocated
        buffer containing a string describing the reason of refusal. The memory
        associated with reason is managed by the RCM daemon.

  |	If the RCM_RETIRE_REQUEST flag is set, the call is in the context
  |     of an I/O retire operation. The RCM module must check if the device
  |     is a critical device. If it is, it should return RCM_FAILURE.
  |	It does not need to update reason. If the device is non-critical
  |	and the RCM_RETIRE flag is set, the module should release the
  |	resource and return RCM_SUCCESS.

        If the client exports higher level resources which depends on rsrcname,
        prefixrequest_offline() should propagate the request to the dependents
        by calling rcm_request_*(). If the call returns RCM_CONFLICT or
        RCM_FAILURE, prefixrequest_offline() must pass the return code back to
        the RCM daemon and pass the info field from rcm_request_*() back as
        dependent_info.

        If RCM_QUERY is specified, the client should return RCM_SUCCESS if     
        rsrcname or it's dependents are not a critical system resources.        
        RCM_CONFLICT must be returned otherwise. In either case, the client    
        should not act on the actual resource.                                 

        If RCM_FORCE is specified, the module should make extra efforts to
        release the resource, such as using the force option to unmount a file
        system.

        prefixnotify_online() is invoked when a previous request to remove the
  |     resource is canceled or the I/O retire fails. The DR client can access
	the resource and proceed with normal operation. The online notification 
	should be passed to higher level resources by calling
	rcm_notify_online().

  |	If the RCM_RETIRE_NOTIFY flag is set, prefixnotify_online()
  |	should perform any cleanup or reacquisition needed if it chooses
  |	to start using the device again.

        prefixnotify_remove() is invoked when a resource is removed from the
  |     system or in the case of I/O retire, has been retired. This
	notification is always proceeded by an prefixrequest_offline()
	invocation. Registration on rsrcname is discarded by the RCM daemon
	after prefixrequest_remove() returns.

  |	If the RCM_RETIRE_NOTIFY flag is set, prefixnotify_remove() should
  |	perform any cleanup now that the resource has been retired.

RETURN VALUES

        RCM_SUCCESS must be returned on success.

        RCM_CONFLICT should be returned if one or more rcm_request_offline()
  |	calls returns RCM_CONFLICT. RCM_CONFLICT should not be returned if
  |	the RCM_RETIRE_REQUEST flag is set. Use RCM_FAILURE instead.

        RCM_FAILURE should be returned if the client or any of the dependent
        resources cannot be suspended and none of the dependent resources has a
        DR operation conflict.

------------------------------------------------------------------------------