#
# Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# ident "%Z%%M% %I%     %E% SMI"


1. Introduction

	1.1 Portfolio Name

		Disk enumeration for Sun Fire X4200 and X4200 M2

	1.2 Portfolio Authors

		Carla Mowers, David Zhang

	1.3 Submission Date

		08/02/2007

	1.4 Project Team Aliases:

		Eric.Schrock@sun.com, Chris.Horne@sun.com,
		Carla.Mowers@sun.com,
		David.Zhang@sun.com

	1.5 Interest List

		sanmas@sun.com

	1.6 List of Reviewers

		Eric.Schrock@Sun.Com

2. Portfolio description

	This portfolio describes the first phase of FMA integration for
	disks in the X4200 and X4200 M2 chassis.  It follows the design
	described in FMA 2007/016 (Generic Disk Topology) and presents
	a similar topology for the 4 internal disks.

	This case does not define any LED capabilities for the drives.
	The 'ok2rm' LED is not physically connected to anything on
	these chassis.  While it would be possible to expose the
	'fault' LED only, the plan is to wait for future phases of
	FMA 2007/015 (Unified Disk FMA) that move this functionality
	into libtopo in a more generic fasion.

	This portfolio allows SMART data to be extracted from the disks
	and diagnosed by the disk-transport module introduced in
	FMA 2007/007 (Generic Disk Monitoring).

	
3. Fault Boundary Analysis (FBA)
     3.1 For systems, subsystems, components or services that make up
         this portfolio, list all resources that will be diagnosed and
         all the ASRUs and FRUs (see RAS glossary for definitions)
         associated with each diagnosis in which the resource may be a
         suspect.

	Please refer to section 11.3 for a background in the enumeration of 
	a generic SCSI device.	The X4200 will use this generic model to generate
	a new platform specifc xml file describing the internal storage of an X4200.
	The only change made was to the perl script that generates the 
	product-specific xml file.

	The underlying resource diagnosed is as the same that described
	in FMA 2007/007, and the same underlying methodology will be
	used:

        hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0648AM0378:
        server-id=dmg4200b:serial=3110SYWX            3LB0SYWX:
        part=SEAGATE-ST973401LSUN72G:revision=0556/bay=0/disk=0

        ASRU: dev:///:devid=id1,sd@SSEAGATE_ST973401LSUN72G_3110SYWX____________3LB0SYWX
        //pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@0,0
        
        FRU: hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0648AM0378:
        server-id=dmg4200b:serial=3110SYWX            3LB0SYWX:
        part=SEAGATE-ST973401LSUN72G:revision=0556/bay=0/disk=0

        Label: HDD_0

	3.2 Diagrams or a description of the faults that may be present in
 	    the subsystem.  A suitable format for this information is an
	    Eversholt Fault Tree (see http://eversholt.central) that describes
	    the ASRU and FRU boundaries, the faults that can be present within
	    those boundaries and the error propagation telemetry for
	    those faults.

			See 11.3 <2007/007 Generic Disk Monitoring (sfx4500 phase 2) >
			for description of faults

4. Diagnosis Strategy
	4.1 Provide a diagnosis philosophy document or a pointer to a
	    portfolio that describes the algorithm used to diagnose the
	    faults described in Section 3.2 and the reasons for using said
	    strateg(y/ies
	   
			See 11.3 <2007/007 Generic Disk Monitoring (sfx4500 phase 2) >

	4.2 If your fault management activity (error handling, diagnosis
	    or recovery) spans multiple fault manager regions, explain
	    how each activity is coordinated between regions.  For example,
	    a Service Processor and Solaris domain may need to coordinate
  	    common error telemetry for diagnosis or provide interfaces
	    to effect recovery operations.

			N/A


5. Error Handling Strategy
	5.1 How are errors handled?  Include a description of the immediate
	    error reactions taken to capture error state and keep the
	    system available without compromising the integrity of the
	    rest of the system or user data.  In the case of a device
	    driver being hardened, describe the recovery/retry behavior,
	    if any.

			See 11.3 <2007/007 Generic Disk Monitoring (sfx4500 phase 2) >


	5.2 What new error report (ereport) events will be defined and
	    registered with the SMI event registry? Include all FMA Protocol
	    ereport specifications.  Provide a pointer to your ercheck
	    output.

			N/A

 	5.3 If you are *not* using a reference fault manager (fmd(1M))
            on your system, how are you persisting ereports and communicating
	    them to Sun Services?

			N/A

	5.4 For more complex system portfolios (like Niagara2), provide a
	    comprehensive error handling philosophy document that describes
	    how errors are handled by all components involved in error
	    handling (including Service Processors, LDOMs, etc.)
	    [As an example, for sun4v platforms this may include specs for
	    reset/config, POST, hypervisor, Solaris, and service processor
	    software components.]

			N/A

6. Recovery/Reaction
	6.1 Are you introducing any new recovery agent(s)?  If so, please
	    provide a description of the recovery agent(s).

			N/A

	6.2 What existing fma modules will be used in response to your faults?

			See 11.4 <2007/004 IO Retire Agents>

	6.3 Are you modifying any existing (Section 6.2) recovery agents?
	    If so, please indicate the agents below, with a brief description
	    of how they will be modified.

			N/A

	6.4 Describe any immediate (e.g. offlining) and long-term (e.g.
	    (e.g. black-listing) recovery.

			N/A

	6.5 Provide pointers to dictionary/po entries and knowledge
	    articles.

			N/A

7.  FRUID Implementation

    	7.1 Complete this section if you're submitting a portfolio for a
    	    platform.

			N/A

8. Test
	8.1 Provide a pointer to your test plan(s) and specification(s).
	    Make sure to list all FMA functionalities that are/are not
	    covered by the test plan(s) and specification(s).

			N/A

       8.2 Explain the risks associated with the test gaps, if any.

			N/A


9. Gaps
	9.1 List any gaps that prevent a full FMA feature set.
	    This includes but is not limited to insufficient error
	    detectors, error reporting, and software infrastructure.

	This portfolio is only part of the long-term disk diagnosis
	strategy outlined in 2007/16.  As such, it does not seek to
	address any of the known gaps outlined in that portfolio,
	namely LED management, SCSI transport diagnosis, or unified
	ZFS diagnosis.
	
	9.2 Provide a risk assessment of the gaps listed in Section 9.1.
	    Describe the customer and/or service impact if said gaps
	    are not addressed.

			See 11.2 <2007/015 Unified Disk Diagnosis>

	9.3 List future projects/get-well plans to address the gaps listed
	    in Section 9.1.  Provide target date and/or release information
	    as to when these gaps will be addressed.

			See 11.2 <2007/015 Unified Disk Diagnosis>

10.Dependencies
       10.1 List all project and other portfolio dependencies to fully realize
	    the targeted FMA feature set for this portfolio. A portfolio may
	    have dependencies on infrastructure projects. For example,
	    The "Sun4u PCI hostbridge" and "PCI-X" projects have a dependency
	    on the events/ereports defined within the "PCI Local Bus"
	    portfolio.

			This portfolio has the following dependencies:
	
      	2007/016 Generic Topology for Internal Disks (sfx4500 phase 3)
			2007/015 Unified Disk Diagnosis
			2007/007 Generic Disk Monitoring (sfx4500 phase 2)
			2006/012 Sun Fire X4500 Disk Failures: Phase I
			2007/004 IO Retire Agent


11. References
      11.1 Provide pointers to all documents referenced in previous
	    sections (for example, list pointers to error handling
	    and diagnosis philosophy documents, test plans,
	    etc.)

    [1] "Sun Fire X4500 Disk Failure: Phase I" portfolio:
		http://fma.eng/documents/engineering/portfolios/2006/012.sfx4500-disk

    [2] "Unified Disk Diagnosis"
		http://fma.eng/documents/engineering/portfolios/2007/015.Unified-Disk-FMA

    [3] "Generic Disk Monitoring (sfx4500 phase 2)" portfolio:
		http://fma.eng/documents/engineering/portfolios/2007/007.Generic-disk-monitoring-sfx4500-p2

    [4] "IO Retire Agent (2007.004.IO_Retireagent)" portfolio:
		http://fma.eng/documents/engineering/portfolios/2007/004.IO_Retireagent/

    [5] Project workspace:
		/net/anthrax.central/export/ws/cth/onnv-sanfma
		/net/anthrax.central/export/ws/cth/events-sanfma
      /net/boora.central/brmnas/yz203490/ws_fma