#
# Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# ident "@(#)portfolio 1.3     08/08/18 SMI"


1. Introduction

	1.1 Portfolio Name

		SCSI Disk Device-as-Detector Diagnosis (phase3)

	1.2 Portfolio Authors

		Chris Horne, Liu Ti,
		Xiao Li, David Zhang

	1.3 Submission Date

	        1.4 Mon Aug 25

	1.4 Project Team Aliases:

		scsifma-bj@sun.com

	1.5 Interest List

		scsifma-bj@sun.com

	1.6 List of Reviewers

		Eric.Schrock@Sun.Com
		Stephen.Hanson@sun.com

2. Portfolio description

	This portfolio describes the third phase of FMA integration for
	SCSI disks. The first phase addressed topology [1], the second
	phase built infrastructure [2], this phase (third) addresses
	'device-as-detector' diagnosis, and a future phase will address
	transport diagnosis and health monitoring.

	Section 2.1 of this portfolio provides an overview of how the
	disk driver and fmd(1M) coordinate fault diagnosis and
	response. This overview will cover:

	  o The derivation of a ereport detector 'device-path'.

	  o The driver rules and fmd(1M) topology implications related
	    to having a 'devid' in an ereport detector.

	  o Design considerations and structure of proposed ereport
	    classes, including class-specific payload.

	  o The definition of the common 'driver-assessment' ereport
	    payload property, its values, and its role in Eversholt
	    propagation rules.

	  o How ereport telemetry relates to general driver messaging,
	    the implementation of a structured log, and current driver
	    scsi_log() and /var/adm/messages use.

	Sections 3-11 of this portfolio follow the standard portfolio
	template, and provide specific implementation information -
	often in the form of references and links to additional
	material. Many of the standard portfolio topics are covered in
	in the overview below.

    2.1	Overview

	The term 'device-as-detector' (name of this case) refers to a
	SCSI device operating as an FMA error detector. The device may
	be an internal disk, or it may be a device located somewhere in
	an external enclosure. In both cases, the device detects
	problems and reports them using T10 standards-defined SCSI
	transport and protocol [19]. The Solaris endpoint for this SCSI
	standards-defined telemetry is the leaf (disk/tape) driver
	(sd(7D)).

	NOTE: In this overview we use the term leaf driver, instead of
	disk driver, because we expect all leaf drivers to use the
	approach outlined. Delivery will be limited to the disk leaf
	driver.

	The leaf driver is responsible for converting SCSI protocol
	defined telemetry into FMA ereport form. The leaf driver, and
	not the SCSA framework, must fill this role because the
	framework does not understand state beyond a single
	scsi_pkt(9S) and does not understand device specific behavior.

	In addition to a leaf driver's detector-oriented role, the
	driver also performs its own low-level error handling. This
	low-level error handling will initiate and coordinate command
	level retry and recovery procedures within the driver itself.
	These procedures are carried out independent of any direct
	fmd(1M) control, but all ereports generated during this process
	have a 'driver-assessment' property. The 'driver-assessment'
	property allows Eversholt device-as-detector rules to track
	'driver-assessment', effectively turning the low-level error
	handling results into a low-level diagnosis. The
	'driver-assessment' property is in some respects like an
	in-band 'service impact' [18].

	The following 'driver-assessment' values are used:

	  fatal:        The driver has failed an operation which, in
			the absence of a fault, should have succeeded.

	  retry:        The driver will retry a failed operation which,
			in the absence of a fault, should have
			succeeded.

	  recovered:    A retry was successful.

	  fail:         The driver has failed an operation for reasons
			unrelated to hardware.

	  info:         The driver encountered sense data, but
			operation was successful operation.

	The term 'detector' refers to something that detects a problem.
	However, in the context of driver generated ereports,
	'detector' is the name of an nvlist embedded in all ereports:
	this can be seen by looking at the error log using 'fmdump
	-ev').

	The ereport detector 'device-path' property describes the
	physical transport hardware that encountered a problem. For
	mpxio operation, the path_instance is used to construct the
	detector 'device-path' [2].

	When the driver can guarantee the identity of a device, the
	ereport detector should contain a 'devid' property.

	An ereport with a 'devid' is considered a device-as-detector
	event. All fmd(1M) diagnosis-engine processing covered by this
	portfolio is device-as-detector oriented. The same ereport
	without a 'devid' is considered transport-detector event. The
	fmd(1M) diagnosis-engine responsible for processing
	transport-detector ereports is future work [4].

	The driver must pay close attention to whether the ereport
	detector should have a devid property or not. The driver writer
	should focus on the accuracy of this choice, not the topology
	and diagnosis-engine ramifications described below. When the
	driver can't guarantee the identity of the device, the detector
	should not have a devid. The driver's choice may be influenced
	by knowledge of transport addressing: providing an accurate
	answer for a transport that addresses the receptacle where a
	device is located (@target,lun) is more complex than providing
	an accurate answer for a transport that addresses the device
	directly (@wWWN,lun).

	Device-as-detector events are processed by the Eversholt
	diagnosis-engine. The Eversholt term for topology is 'config',
	and config information is obtained from libtopo snapshots.
	Basic storage topology was defined by [1], and [2] enhanced the
	Eversholt in two ways for dealing with storage:

	o The Eversholt language was enhanced by adding a
	  'discard_if_config_unknown' ereport property.

	o The Eversholt diagnosis-engine front end was enhanced to
	  match topology based on detector 'devid', and to silently
	  discard telemetry that does not match topology/config for
	  ereport classes defined as 'discard_if_config_unknown'.

	For device-as-detector events, propagation rules are tied
	directly to the common 'driver-assessment' property provided by
	the leaf driver. This allows user-land fmd(1M) operation to
	track low-level driver diagnosis. This approach keeps existing
	low-level driver diagnosis procedures largely intact, yet
	allows them to trigger more sophisticated fmd(1M) agent
	activity like io-retire [10].

	For device-as-detector ereports with a 'devid' topology match
	in the Eversholt front end, the Eversholt rules delivered by
	this portfolio will cause an ereport with a 'driver-assessment'
	property value of 'fatal' to trigger fault event of the same
	class. Stated another way, our .esc rules generate faults when
	an ereport has a 'devid', the 'devid' matches topology, and the
	'driver-assessment' is 'fatal'. Ereports with a 'devid'
	topology match and a non-'fatal' 'driver-assessment' generate
	upsets. Like many Eversholt consumers, we expect upsets to be
	discarded. We also expect fmd(1M) to treat fault events with
	the same FRU and ASRU as an existing fault as duplicates. Raw
	ereport telemetry is always available from the error log via
	'fmdump -ev', even when a discard occurs.

	Agent activity for storage is not new: selected platforms have
	supported internal fmd(1M) generated 'ereport.io.scsi.disk'
	'.predictive-failure', '.self-test-failure', and
	'.over-temperature' ereports (and associated faults) for a long
	time.

	For ereport and fault classes, we chose to define a minimal set
	of event classes based on available payload information and the
	FRU/ASRU orientation of that information. We intentionally
	decided to not define classes based on interpretation of
	payload information.

	The structure of new classes is shown in the diagram below. To
	keep things simple, the same class structure is used for all
	types of events.  While other types of scsi devices, like tape,
	are expected to use a similar class structure, to allow for
	knowledge articles specific to device-type and to allow for
	future flexibility in areas like disk.vs.tape SERD, we have a
	'.disk' device-type level in the proposed class structure.
	Eversholt rules map ereports of class '.merr', and '.derr', to
	faults when they have a 'devid', the 'devid' matches topology,
	and the 'driver-assessment' property value is 'fatal'. A
	'.uderr' will map to a fault/defect at some future date,
	depending on how gap 4 is resolved. A '.recovered' ereport will
	always have a 'driver-assessment' of 'recovered'.  Ereports of
	class '.tran' never have a '.devid', and will be discarded by
	the Eversholt diagnostic-engine front end. All other ereports
	that end up in Eversholt (match topology) map to upsets - with
	a structured error log view of device behavior available in the
	'fmdump -e' error log.

                                       |
                                    io.scsi
                                       |
                                     .cmd
         [driver-assessment,op-code,cdb,pkt-reason,pkt-state,pkt-stats]
                                       |
                                     .disk
                                       |
          +----------------------------+----------------------------+
          |                            |                            |
     .recovered                      .dev                         .tran
                                   [stat-code]                      :
                                       |                         {future}
                  +--------------------+--------------------+
                  |                    |                    |
                .rqs                 .serr               .uderr++
      [key*,asc*,ascq*,sense-data]                     [info,value]
                  |
             +---------+
             |         |
           .merr+    .derr+
           [lba*]

                                    :LEGEND:
        .cmd       scsi_command            .derr      device_error            
        .dev       have_info_from_device   .disk      disk_related
        .merr      media_error             .recovered recovered
        .rqs       request_sense           .serr      scsi_status_error
        .tran      transport_error         .uderr     unexpected_data_error

                       [payload_property[,payload_property]*]
                         + maps to 'device-as-detector' fault
                         ++ future fault/defect, see gap 4
                          * property promoted into fault


	Our choice of simple class names means that the fault names do
	not directly provide a detailed interpretation of the fault.
	Instead, we promote specific ereport properties into the faults
	events (like SCSI T10 standard defined key/asc/ascq
	properties). Promoted properties, and their values, are
	available as "hc-specific" members of the fault FMRI, and can
	be viewed with 'fmdump -V'.

	All ereports generated will show up in the the error log and
	can be displayed and filtered using 'fmdump -e'. The fmdump
	command supports a number of different filtering mechanisms
	([15], [2]). A future project could provide human readable
	error log output - and reconstruct scsi_log()-like messages
	currently in /var/adm/messages.

	The 'recovered' ereport is generated so that each 'retry'
	sequence has a resolution of either 'recovered' or 'fault' in
	'fmdump -e' output of the error log.

	The project is introducing one private sd(7D)/ssd(7D)
	driver.conf(4) property to control reporting of FMA telemetry
	via /var/adm/messages.

	'fm-scsi-log'   The default value is 0, set this to 1 to enable
			FMA telemetry logging via scsi_log(9F) messages
			captured in the /var/adm/messages file.

	The 'fm-scsi-log' property is supported to mitigate risk due to
	unforeseen dependencies on /var/adm/messages. Any use of this
	tunable should be considered a short-term fix. This tunable may
	be removed, without notice, at a future date.

	The generation of ereports can be disabled via the standard
	driver.conf(4) facility mentioned in ddi_fm_init(9F). In
	addition, there is a system(4) global variable called
	'scsi_fm_capable' that provides the default value when the
	'fm-capable' property is undefined. The default value of
	'scsi_fm_capable' is DDI_FM_EREPORT_CAPABLE.

	From an ereport rate perspective, we have the conflicting goals
	of providing complete error log information and of providing
	reliable fault diagnosis. These goals conflict because the
	first wants to capture complete retry sequences, and the second
	wants to ensure that ereports leading to faults are never
	dropped.

	The driver already implements various forms of
	delay-before-retry. Also, in situations where upper level
	software tells the driver that another valid copy of the data
	exists on a different device, one failing retry sequence
	results in failure of all active IO to the device (without
	additional retry/delay): both SVM and ZFS use B_FAILFAST. These
	mechanism will help reduce ereport rate, and reduce the chance
	of dropping ereports.

	If we experience problems with dropped ereports, there are two
	things we can do to help:

	o Enhance the framework and scsi ereport post code so that the
	  posting code can assign a priority to an ereport: high
	  priority ereports are less likely to be dropped. For what we
	  are doing, a 'driver-assessment' of 'fatal' would be high
	  priority because we know they can generate in fault events.

	o Limit the rate at which transport-detector events (i.e.
	  events without devids) are generated: dropping events that
	  exceed the maximum rate). The highest ereport rate is
	  expected to occur with transport-detector events. An example
	  would be a 'switch' failure under heavy load to lots of disks
	  (no B_FAILFAST). A single switch fault can affect many
	  initiator ports and disks. The future diagnosis of
	  transport-detector events ([4]) is expected to issue active
	  probes to determine fault location - an initial event is
	  needed to trigger the generation of active probes, but
	  dropping some of the initial transport-detector should not
	  affect the final diagnosis.


3. Fault Boundary Analysis (FBA)
	3.1 For systems, subsystems, components or services that make
	    up this portfolio, list all resources that will be
	    diagnosed and all the ASRUs and FRUs (see RAS glossary for
	    definitions) associated with each diagnosis in which the
	    resource may be a suspect.\\

	    See [1] and [2]

	3.2 Diagrams or a description of the faults that may be present
	    in the subsystem. A suitable format for this information is
	    an Eversholt Fault Tree (see http://eversholt.central) that
	    describes the ASRU and FRU boundaries, the faults that can
	    be present within those boundaries and the error
	    propagation telemetry for those faults.\\

	    See overview.
	    Ereports:			See [99] 'ereport' file
	    Fault Tree:			See [99] 'disk.esc' file
	    Event Registry Changes:	See [99] 'report.html' file

4. Diagnosis Strategy
	4.1 Provide a diagnosis philosophy document or a pointer to a
	    portfolio that describes the algorithm used to diagnose the
	    faults described in Section 3.2 and the reasons for using
	    said strategy(y/ies).\\

	    See overview.

	4.2 If your fault management activity (error handling,
	    diagnosis or recovery) spans multiple fault manager
	    regions, explain how each activity is coordinated between
	    regions. For example, a Service Processor and Solaris
	    domain may need to coordinate common error telemetry for
	    diagnosis or provide interfaces to effect recovery
	    operations.\\

	    N/A

5. Error Handling Strategy
	5.1 How are errors handled?  Include a description of the
	    immediate error reactions taken to capture error state and
	    keep the system available without compromising the
	    integrity of the rest of the system or user data. In the
	    case of a device driver being hardened, describe the
	    recovery/retry behavior, if any.\\

	    See overview.
	    Demo of use:		See [99] 'demo' file.

	5.2 What new error report (ereport) events will be defined and
	    registered with the SMI event registry? Include all FMA
	    Protocol ereport specifications. Provide a pointer to your
	    ercheck output.\\

	    New ereports:		See [99] 'ereport' file
	    Output of ercheck:		See [99] 'report.html' file

	5.3 If you are *not* using a reference fault manager (fmd(1M))
	    on your system, how are you persisting ereports and
	    communicating them to Sun Services?\\

	    N/A

	5.4 For more complex system portfolios (like Niagara2), provide
	    a comprehensive error handling philosophy document that
	    describes how errors are handled by all components involved
	    in error handling (including Service Processors, LDOMs,
	    etc.) [As an example, for sun4v platforms this may include
	    specs for reset/config, POST, hypervisor, Solaris, and
	    service processor software components.]\\

	    N/A

6. Recovery/Reaction
	6.1 Are you introducing any new recovery agent(s)?  If so,
	    please provide a description of the recovery agent(s).\\

	    N/A

	6.2 What existing fma modules will be used in response to your
	    faults?\\

            syslog-msgs [15]
	    io-retire [10]

	6.3 Are you modifying any existing (Section 6.2) recovery
	    agents?  If so, please indicate the agents below, with a
	    brief description of how they will be modified.\\

	    N/A

	6.4 Describe any immediate (e.g. offlining) and long-term
	    (e.g. (e.g. black-listing) recovery.\\

	    N/A

	6.5 Provide pointers to dictionary/po entries and knowledge
	    articles.\\

	    See [99] 'DISK.dict' and 'DISK.po' files.

7. FRUID Implementation

	7.1 Complete this section if you're submitting a portfolio for
	    a platform.\\

	    N/A

8. Test
	8.1 Provide a pointer to your test plan(s) and
	    specification(s). Make sure to list all FMA functionalities
	    that are/are not covered by the test plan(s) and
	    specification(s).\\

	    See [99] 'testplan_ut.txt' and 'testplan_ut_result.txt'
	    files.

	    The sd driver supports a scsi_pkt(9S) fault injection
	    mechanism at the front end of sdintr(). Our tools and test
	    scripts use this mechanism. The demos in [20] are also
	    generated using this. See [21] for a description.

	    During development, we also developed a dtrace 'inject'
	    fault injection mechanism. An 'inject' was similar to the
	    dtrace 'breakpoint', but with the D program providing a
	    string with kmdb commands to execute to automate fault
	    injection. It was not considered a deliverable approach, so
	    we switched the method above.

	8.2 Explain the risks associated with the test gaps, if any.\\

	    1) Testing can't be run on all types of HBAs - we will
	       however ensure coverage on auto_request_sense and
	       non-auto_request_sense HBAs.

	    2) There is a testing hole when the system is issuing SCSI
	       commands during HBA driven device enumeration - before a
	       disk device node is even created. Testing will cover
	       sdattach but will not cover HBA attach or transport
	       enumeration.

9. Gaps
	9.1 List any gaps that prevent a full FMA feature set. This
	    includes but is not limited to insufficient error
	    detectors, error reporting, and software infrastructure.\\

	    1) FRUID_for_disk: disk faults lack Sun part-number information
	       http://monaco.sfbay.sun.com/detail.jsf?cr=6740012

	    2) ereport priority: detector indicates priority
	       http://monaco.sfbay.sun.com/detail.jsf?cr=6740013

	    3) disk service-impact mapping: devinfo state and cfgadm output
	       http://monaco.sfbay.sun.com/detail.jsf?cr=6740014

	    4) FRU for firmware:
	       http://monaco.sfbay.sun.com/detail.jsf?cr=6740015

	    A) Topology: Device-as-detector diagnosis depends on
	       understanding topology, and the system does not know how
	       to represent the topology of all disks in the system.

	    B) Health-Monitor: Implement phase 4 [4].

	    C) Transport-DE. Implement phase 4 [4].

	    D) fmdump: Human readable 'fmdump -e' output that looks
	       like current scsi_log() messages in /var/adm/messages.

	    E) media: '.merr' faults have ASRUs the device level
	       instead of at the block level, agents don't know how to
	       handle block level faults.

	    F) SERD: Serd Engine for recovered errors.

	    G) iSCSI: 'device-path' does not map to the NIC hardware
	       used.


	9.2 Provide a risk assessment of the gaps listed in Section
	    9.1. Describe the customer and/or service impact if said
	    gaps are not addressed.\\

	    1) FRUID_for_disk: Delay and confusion in obtaining proper
	       Sun qualified replacement part, increased potential for
	       use of non-qualified replacement.

	    2) ereport priority: Faults not being processed correctly.

	    3) disk service-impact mapping: customer sees poor
	       integration across various solaris utilitys that process
	       system state.

	    4) FRU for firmware: mandated replacement of components
	       that are unlikely to resolve the problem.
	       

	    A) Topology: When topology is unknown, ereports associated
	       with fatal conditions will not produce faults.


	    B) Health-Monitor: Faults on components that are not being
	       accessed may remain unexposed for extended periods of
	       time, and are often exposed when they are needed the
	       most (standby path, hot spare).

	    C) Transport-DE: Ereports associated with fatal conditions
	       will not produce faults.

	    D) fmdump: Customers must look at raw 'fmdump -e' output
	       instead of the more familiar scsi_log() representation
	       of the same information.

	    E) media: We may be faulting entire devices in situations
	       where a finer-grained lba/partition approach is more
	       appropriate.

	    F) SERD: We may continue to use marginal/suspect hardware.

	    G) iSCSI: Impact is minimal for device-as-detector ereports
	       (this phase [3]), for next phase [4] transport-detector
	       faults will not be able to identify specific NIC ports.


	9.3 List future projects/get-well plans to address the gaps
	    listed in Section 9.1. Provide target date and/or release
	    information as to when these gaps will be addressed.\\

	    1)   FRUID_for_disk: CR filed, considered high priority,
		 schedule planning is needed,

	    2)   ereport priority: CR filed, schedule planning is
		 needed.

	    3)   disk service-impact mapping: CR filed, schedule
		 planning is needed.

	    4)   FRU for firmware: CR filed, schedule planning is
		 needed.

	    A)   Topology: One possibility is to populate disks that
		 lack topology in an "unknown enclosure". More planning
		 is needed.

	    B-C) Health-Monitor, Transport-DE: While some prototype
		 work has been done, after putback of this phase
		 (three), we intend to start working on a more detailed
		 schedule for phase 4. This may include breaking up the
		 phase 4 goals into smaller sub-phases that are
		 delivered independently.

	    D-G) At this point in time we don't have any plans to
		 address these issues.

10. Dependencies
       10.1 List all project and other portfolio dependencies to fully
	    realize the targeted FMA feature set for this portfolio. A
	    portfolio may have dependencies on infrastructure
	    projects. For example, The "Sun4u PCI hostbridge" and
	    "PCI-X" projects have a dependency on the events/ereports
	    defined within the "PCI Local Bus" portfolio.\\

	    See [0].

11. References
      11.1 Provide pointers to all documents referenced in previous
	   sections (for example, list pointers to error handling and
	   diagnosis philosophy documents, test plans, etc.)\\

	   See [99] for pointers to project specific information.

   UMBRELLA:
   [ 0]	Umbrella for Disk FMA: Unified Disk FMA
	http://wikihome.sfbay/fma-portfolio/Wiki.jsp?page=2007.015.UnifiedDisk

   STORAGE FMA:
   [ 1]	Phase1: Topology: Generic Topology for Internal Disks
	http://wikihome.sfbay/fma-portfolio/Wiki.jsp?page=2007.016.DiskTopology
	http://sac.sfbay/PSARC/2007/388
	http://www.opensolaris.org/os/community/arc/caselog/2007/388

   [ 2] Phase2: Infrastructure: Multiplexed I/O Enhancements to Support FMA
	http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2008.004.MPXIO
	http://sac.sfbay/PSARC/2008/077
	http://www.opensolaris.org/os/community/arc/caselog/2008/077

   [ 3]	Phase3: Device-as-Detector (THIS CASE)
	[99] 'portfolio' file
	http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2008.032.SCSIP3
	http://sac.sfbay/PSARC/2008/XXX...TBS
	http://www.opensolaris.org/os/community/arc/caselog/2008/XXX...TBS

   [ 4]	Phase4: Transport Diagnosis (Future)
	...future

   RELATED_THUMPER_WORK:
   [ 5] FMA 2006/012: Sun Fire X4500 Disk Failures: Phase I
	http://fma.eng/documents/engineering/portfolios/2006/012.thumper
	http://sac.eng/PSARC/2006/322
	http://www.opensolaris.org/os/community/arc/caselog/2006/322

   [ 6] "Generic Disk Monitoring (sfx4500 phase 2)" portfolio:
	http://fma.eng/documents/engineering/portfolios/2007/007.Generic-disk-monitoring-sfx4500-p2
	http://sac.eng/Archives/CaseLog/arc/PSARC/2007/202
	http://www.opensolaris.org/os/community/arc/caselog/2007/202


   RELATED_ZFS_WORK:
   [ 7] FMA 2005/019 ZFS FMA Phase 0
	http://fma.eng/documents/engineering/portfolios/2005/019.zfs

   [ 8] FMA 2006/005 ZFS FMA Phase 1
	http://fma.eng/documents/engineering/portfolios/2006/005.zfs-phase1
	http://sac.eng/PSARC/2006/139
	http://www.opensolaris.org/os/community/arc/caselog/2006/139

   [ 9] FMA 2007/006 ZFS FMA Phase 2
	http://fma.eng/documents/engineering/portfolios/2007/006.ZFS-P2
	http://sac.eng/PSARC/2007/283/
	http://www.opensolaris.org/os/community/arc/caselog/2007/283

   IORETIRE:
   [10] FMA 2007/004 Solaris I/O Retire Agent
	http://fma.eng/documents/engineering/portfolios/2007/004.IO_Retireagent
	http://sac.eng/PSARC/2007/290
	http://www.opensolaris.org/os/community/arc/caselog/2007/290

   MISC:
   [11] Improved Disk-Drive Failure Warnings
	http://charlotte.ucsd.edu/users/elkan/ieeereliability.pdf.

   [12] Dev scheme specification - Section 8.4.3
	http://fma.eng/documents/engineering/protocol_whtppr.pdf

   [13] EVERSHOLT: Eversholt Diagnosis Technology
	http://sac.eng/PSARC/2003/428

   [14] EVERSHOLT: Eversholt Language Manual (Version 1.5 10/04/06)
	http://eversholt.central/docs/language/

   [15]	FMD: Solaris Fault Management Daemon
	http://sac.sfbay/PSARC/2003/089/

   [16] OLD: FMA 2006/013 SCSI FMA Phase 1 (Withdrawn)
	http://fma.eng/documents/engineering/portfolios/2006/013.scsi-phase1

   [18] "service impact" information.
	ddi_fm_ereport_post(9F)
	http://fma.sfbay/documents/engineering/fmaioprm/chap3-9.html
	http://sac.eng/PSARC/2007/290
	http://sac.eng/PSARC/2002/288

   [19] T10 SCSI Standards 
	http://t10.org

   PROJECT:
   [20] Project details:
	http://fogbroom.prc/bjroot/users/yz203490/FMA_phase3

   [21] Project details: fault injection
	http://fogbroom.prc/bjroot/users/xiaoli/onnv-fma3/unit_test/README
	http://fogbroom.prc/bjroot/users/xiaoli/onnv-fma3/unit_test/

   [99] SCSI Disk Device-as-Detector Diagnosis (phase3):
	http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2008.032.SCSIP3