ntroduction

	1.1 Portfolio Name
	    Intel 5400 chipset Memory Controller Hub
	   
	1.2 Portfolio Author
	    Adrian Frost

	1.3 Submission Date
	    11 March 2008

	1.4 Project Team Aliases:
	    Adrian.Frost@sun.com
	   
	1.5 Interest List
	    Fadi.Salem@Sun.COM, Foz.Saeed@Sun.COM, Sridhar.Yedunuthula@Sun.COM,
	    Kim.V.Tran@Sun.COM, fma-core@sun.com
	    
	1.6 List of Reviewers
	    List any individuals/groups that have reviewed and/or approved
	    this portfolio.  It is recommended that the portfolio be pre-
	    reviewed by groups such as Service, RAS review committees, Quality 
	    Engineering, etc.

	    Reviewer	Group	Version	Date 	 Comments
				of 	Reviewed (Approved/Rejected/Others
	    --------	-----	------- -------- -------------------------

		


2. Portfolio description
    The 5400 memory controller hub is an enhancement to the 5000 series.
    It support faster front side bus, lower latency I/O and larger memory dimms.

3. Fault Boundary Analysis (FBA)
     3.1 For systems, subsystems, components or services that make up 
         this portfolio, list all resources that will be diagnosed and
         all the ASRUs and FRUs (see RAS glossary for definitions) 
         associated with each diagnosis in which the resource may be a 
         suspect.

	This is the same as 5000 series see section 3.1 of FMA portfolio
	2007.022.Intel

	http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2007.022.Intel
			

	3.2 Diagrams or a description of the faults that may be present in
 	    the subsystem.  A suitable format for this information is an
	    Eversholt Fault Tree (see http://eversholt.central) that describes
	    the ASRU and FRU boundaries, the faults that can be present within
	    those boundaries and the error propagation telemetry for those faults.

	    The topology of the 5400 is the same as 5000 Series, there is one
	    new fault fault.cpu.intel.nb.otf, two new ereports
	    ereport.cpu.intel.nb.otf and ereport.cpu.intel.nb.spd.
	    All three are raised against existing topology nodes.

http://hyper.sfbay.sun.com/net/hyper/tank/ws/af/intel5400.portfolio/fmtopo-p   

http://hyper.sfbay.sun.com/net/hyper/tank/ws/af/intel5400.portfolio/ercheck.html


	
4. Diagnosis Strategy
	4.1 Provide a diagnosis philosophy document or a pointer to a
	    portfolio that describes the algorithm used to diagnose the 
	    faults described in Section 3.2 and the reasons for using said 
	    strateg(y/ies).
	    
	    There are two new ereports ereport.cpu.intel.nb.otf which will
	    always be diagnosed to fault.cpu.intel.nb.otf.
	    ereport.cpu.intel.nb.spd is a recoverable error which should only
	    happen at system initialization, the ereport is discarded in
	    eversholt


	4.2 If your fault management activity (error handling, diagnosis 
	    or recovery) spans multiple fault manager regions, explain
	    how each activity is coordinated between regions.  For example,
	    a Service Processor and Solaris domain may need to coordinate
  	    common error telemetry for diagnosis or provide interfaces
	    to effect recovery operations.

	    N/A



5. Error Handling Strategy
	5.1 How are errors handled?  Include a description of the immediate
	    error reactions taken to capture error state and keep the 
	    system available without compromising the integrity of the 
	    rest of the system or user data.  In the case of a device 
	    driver being hardened, describe the recovery/retry behavior,
	    if any.

	    see FMA portfolio 2007.022.Intel

	5.2 What new error report (ereport) events will be defined and
	    registered with the SMI event registry? Include all FMA Protocol 
	    ereport specifications.  Provide a pointer to your ercheck
	    output.

	    The payload of some other ereports is extended to include extra
	    error registers
http://hyper.sfbay.sun.com/net/hyper/tank/ws/af/intel5400.portfolio/ercheck.html


 	5.3 If you are *not* using a reference fault manager (fmd(1M))
            on your system, how are you persisting ereports and communicating
	    them to Sun Services?

	    N/A

	5.4 For more complex system portfolios (like Niagara2), provide a
	    comprehensive error handling philosophy document that descibes 
	    how errors are handled by all components involved in error 
	    handling (including Service Processors, LDOMs, etc.)
	    [As an example, for sun4v platforms this may include specs for 
	    reset/config, POST, hypervisor, Solaris, and service processor 
	    software components.]

	    N/A

6. Recovery/Reaction
	6.1 Are you introducing any new recovery agent(s)?  If so, please
	    provide a description of the recovery agent(s).

	    N/A

	6.2 What existing fma modules will be used in response to your faults?

		[ X ] cpumem-retire

	6.3 Are you modifying any existing (Section 6.2) recovery agents? 
	    If so, please indicate the agents below, with a brief description
	    of how they will be modified.

	    N/A

	6.4 Describe any immediate (e.g. offlining) and long-term (e.g.
	    (e.g. black-listing) recovery.

	    N/A

	6.5 Provide pointers to dictionary/po entries and knowledge
	    articles.

http://hyper.sfbay.sun.com/net/hyper/tank/ws/af/intel5400.portfolio/INTEL.dict
http://hyper.sfbay.sun.com/net/hyper/tank/ws/af/intel5400.portfolio/INTEL.po

7.  FRUID Implementation

    	7.1 Complete this section if you're submitting a portfolio for a 
    	    platform.

	    (Refer to http://webhome.sfbay/FRUID/ for additional information 
	     on FRU ID requirements and reference material.)
    	 
    	 	7.1.1 Summarize the platform's level of conformance to the 
    	 	      policies described in "The Policies and Best Practices 
    	 	      for the Recording of FMA Status and Event Data in FRUID 
    	 	      Storage Devices".  [Refer to  
         	      http://fma.eng.sun.com/developer/psh_tech/psh-tech.html
		      for a copy of this document.]
		
	 	7.1.2 Indicate which FRUs listed in Section 3.1 comply with
	 	      the policies & best practices and which FRUs do not.
	 	
	 	7.1.3 Provide a link to the document describing the component
	 	      map for each FRU.  An example can be found in Appendix C
	 	      of the FRUID Common Dynamic Data Defnition Version 1.2.3.
		      (Refer to http://fruid.sfbay/externalspecs/fruiddyn1)
	 	
	        7.1.4 Provide a link to the document describing what platform 
		      specific event information, if any, will be recorded in 
		      the "diagdata" field of the Status_EventsR record for 
		      each message id.

8. Test
	8.1 Provide a pointer to your test plan(s) and specification(s).
	    Make sure to list all FMA functionalities that are/are not
	    covered by the test plan(s) and specification(s).

	Testing will be caried out using software error injection on Intel
	Stoakley system and protype Sun Venus. There will be regression testing
	on systems with 5000 series and 7300 Northbridge.

       8.2 Explain the risks associated with the test gaps, if any.

	N/A

9. Gaps
	9.1 List any gaps that prevent a full FMA feature set.
	    This includes but is not limited to insufficient error 
	    detectors, error reporting, and software infrastructure.

	N/A

	9.2 Provide a risk assessment of the gaps listed in Section 9.1.
	    Describe the customer and/or service impact if said gaps
	    are not addressed.

	N/A

	9.3 List future projects/get-well plans to address the gaps listed
	    in Section 9.1.  Provide target date and/or release information 
	    as to when these gaps will be addressed.

	N/A

10.Dependencies	
       10.1 List all project and other portfolio dependencies to fully realize
	    the targeted FMA feature set for this portfolio. A portfolio may
	    have dependencies on infrastructure projects. For example,
	    The "Sun4u PCI hostbridge" and "PCI-X" projects have a dependency
	    on the events/ereports defined within the "PCI Local Bus" 
	    portfolio.

	N/A

11. References
      11.1 Provide pointers to all documents referenced in previous
	    sections (for example, list pointers to error handling
	    and diagnosis philosophy documents, test plans, 
	    etc.)

http://download.intel.com/design/chipsets/datashts/318610.pdf