SCSI SAS 1275 Binding

Title:          
                                Version 1.0

1  Introduction

1.1  Overview and References

This document describes the application of Open Firmware to the SCSI-3 protocol
as implemented on Serial Attached SCSI (SAS).

2  References and Definitions

2.1  References

[1] IEEE Standard 1275-1994 IEEE Standard for Boot (Initialization and
    Configuration) Firmware: Core Requirements and Practices
[2] Device Support Extensions to IEEE 1274-1994, Revision 1.0
[3] FWARC 2005/751, SAS WWID determined from system MAC address
[4] FWARC 2006/035, LSI SAS Controller Methods for Manufacturing and Service
[5] PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot Firmware Rev. 2.1
      http://noho.eng/1275/bindings/pci/pci2_1.pdf
[6] FWARC 2003/637, PCI Express Bus Binding to IEEE 1275
[7] SCSI Architecture Model-4  
      http://t10.org/ftp/t10/drafts/sam4/sam4r13.pdf
[8] Serial Attached SCSI 1.1 (SAS 1.1)
      http://t10.org/ftp/t10/drafts/sas1/sas1r10.pdf
[9] SCSI ATA Translation
      http://t10.org/ftp/t10/drafts/sat/sat-r09.pdf
[10] SCSI-3 Parallel Bindings
      http://noho.eng/1275/practice/spi/spi1_0.ps

2.2  Definition of Terms

bus node: an Open Firmware device node that represents a bus controller.  In
          cases where a node represents the interface, or "bridge", between
	  one bus and another, the node is both a bus node relative to the
          bus it controls and a child node of its parent bus.  Note that an
	  Open Firmware device node is not in itself a physical hardware
	  device, rather it is a software abstraction that describes a
          hardware device.

logical unit: A target resident entity that implements a device model and
              executes SCSI commands sent by an application client.

LUN:	Logical unit number.

phy:	The part of a device used to connect to other devices.  One end
	of a point-to-point SAS link.

port:	An entity at one end of a SCSI-3 initiator/target nexus.

target port: The port of a SCSI-3 target device.

SASAddress:  64-bit globally unique identifier used for addressing over
	     SAS fabrics.  Its format is defined by the SAS [8] standard.

SAS expander:  See SAS switch.

SAS switch:  A device that extends the SAS fabric and allows a single
    	     SAS phy to communicate with multiple target devices.  SAS
	     switches may be cascaded.


3  Bus Charactistics

3.1  Physical Address Formats and Representations

3.1.1  Physical Address Formats

3.1.1.1  Numerical Representation

The numerical representation of an address for a "scsi-sas" device type
consists of four cells encoded as follows.  Bit #0 refers to the least
significant bit.

                Bit #   33222222 22221111 11111100 00000000
                        10987654 32109876 54321098 76543210

        sas.hi cell:    ssssssss ssssssss ssssssss ssssssss
        sas.lo cell:    ssssssss ssssssss ssssssss ssssssss
        lun.hi cell:    llllllll llllllll llllllll llllllll
        lun.lo cell:    llllllll llllllll llllllll llllllll

where:
        ss..ss          64-bit unsigned number SASAddress
        ll..ll          64-bit unsigned number logical unit


3.1.1.2  Text Representation

SAS controllers support three textual address representations.  The canonical
text representation of the address is of the SASAddress form, and specifies
the SASAddress of the device's target SAS port.  The encode-unit will only
generate the SASAddress form, never the other forms (the phy number form or
the SATA identity form).

SATA devices in a SAS fabric do not have a permanently assigned SAS address.
If the fabric is reconfigured and results in SATA devices being renamed, 
the boot path may point to the wrong device.  To mitigate this problem,  
SATA devices may be addressed using a synthetic globally unique SATA 
identity form.  This form is only for SATA devices and will never be used 
to address a SAS device. 

SAS controllers support a third optional alternative addressing form in 
the decode-unit method in the form of the phy number.  This form specifies
the number of the phy used to communicate to the child device.  This form 
is only valid for devices that are directly attached to the SAS controller,
and can be used for both SAS and SATA devices.  If a SAS expander is 
connected to a phy, multiple target devices may be accessible through that 
one phy and its use is ambiguous.  If opening a device specifies a phy 
address and there is a SAS expander connected to that phy the open request 
should fail.


3.1.1.2.1  SASAddress Representation

The text representation of a SASAddress is of the following form:

        wNNNNNNNNNNNNNNN[,LLLLLLLLLLLLLLLL]

where:

        w               is the letter 'w'

        NNNNNNNNNNNNNNNN is an ASCII hexadecimal number in the range
                        0...FFFFFFFFFFFFFFFF that specifies the SASAddress
			of the device's target SAS port.

        LLLLLLLLLLLLLLLL is an ASCII hexadecimal number in the range
                        0...FFFFFFFFFFFFFFFF specifying a LUN.  This portion 
			of the address is optional and may be omitted if zero.

Conversion of hexadecimal numbers from text representation to numerical
representation shall be case-insensitive and leading zeros shall be permitted
but not required.

Conversion from numerical representation to text representation shall use the
lower case forms of the hexadecimal digits in the range a...f, suppresing
leading zeros.

The correspondence between the text representation and numerical
representation is as follows:

        wNNNNNNNNNNNNNNN,LLLLLLLLLLLLLLLL

          corresponds to a Node name with numerical value:
                ss...ss is a binary encoding of NNNNNNNNNNNNNNN 
                ll...ll is a binary encoding of LLLLLLLLLLLLLLLL


3.1.1.2.2  Phy Number Representation

The text representation of a phy number address is of the following form:

        PP[,LLLLLLLLLLLLLLLL]

where:

        PP               is an ASCII hexadecimal number in the range
                         of 0...FF specifying the host adapter's phy
			 number.

        LLLLLLLLLLLLLLLL is an ASCII hexadecimal number in the range
                         of 0...FFFFFFFFFFFFFFFF specifying a LUN.
                         This portion of the address is
                         optional and may be omitted if zero.

Conversion of hexadecimal numbers from text representation to numerical
representation shall be case-insensitive and leading zeros shall be permitted
but not required.

The phy number is internally converted to the SASAddress of the directly
attached device when generating the unit address.  If a SAS expander is
directly attached to that phy, this operation will fail, and decode-unit
should throw an exception.

Note: There is no numerical representation of the phy number format.


3.1.1.2.3  SATA Identity Representation

SATA drives on a SAS fabric have a SASAddress assigned to them by the device
they are connected to.  Unlike SAS disks which have a SASAddress permanently
associated with them, this address can change for a SATA disk if it is ever
relocated within the fabric.  To provide similar behavior to SAS disks, a
special addressing form is provided for SATA disks.  

SATA disks can be addressed by the SCSI VPD information available from the
SCSI inquiry page 83 logical unit name (data bytes 4 through 11 specified 
in SAT[9] section 10.3.4).

The text representation of a SATA identity is of the following form:

        sSSSSSSSSSSSSSSSS[,LLLLLLLLLLLLLLLL]

where:

        SSSSSSSSSSSSSSSS is an ASCII hexadecimal number in the range
                         of 0...FFFFFFFFFFFFFFFF specifying the device's
			 INQUIRY page 83 logical unit name (data bytes 4-11).

        LLLLLLLLLLLLLLLL is an ASCII hexadecimal number in the range
                         of 0...FFFFFFFFFFFFFFFF specifying a LUN.
                         This portion of the address is
                         optional and may be omitted if zero.

Conversion of hexadecimal numbers from text representation to numerical
representation shall be case-insensitive and leading zeros shall be permitted
but not required.

The SAS adapter driver converts the SATA identifier to the current SASAddress
of the SATA device and uses that for the device's unit address.

Note: There is no numerical representation of the SATA identity format.


4  Bus Nodes

4.1  Properties

Since a SAS controller is not a root nexus and can be attached to many
different bus types, the controller node needs to provide any properties or
methods defined by its parent bus node.  This document defines properties and
methods that are specific to SAS controller nodes and their children.

4.1.1  Open Firmware-defined Properties for Bus Nodes

The following standard properties, as defined in Open Firmware [1], have special
meaning or interpretation for SAS:

"name"
        Type:  Prop-encoded-string
        Value: "scsi"

"device_type"
        Type:  Prop-encoded-string
        Value: "scsi-sas"

"#address-cells"
        Type: Prop-encoded-integer
        Contents: Standard property name to define package's address format
        Value: 4

"compatible"
        Type:  Prop-encoded-array
	Contents: This is defined by bus or device specific bindings.
		  Typically this would be the IEEE 1275 PCI Bus Bindings [5].


4.2  Methods

4.2.1  Open Firmware-defined Methods for Bus Nodes

A Standard Package implementing the "scsi-sas" device type shall implement the
following standard methods as defined in Open Firmware [1], with physical
address representations as specified in 3.1 of this standard:

    open ( -- okay? )
        Prepare this device for subsequent use.

        Typical behavior is to allocate any special resource requirements it
        needs, map the device into virtual address space, initialize the
        device and perform a brief "sanity test" to ensure that the device
        appears to be working correctly.

        Return true if this open method was successful, false if not.

        When a device's open method is called, that device's parent has
        already been opened (and so on, up to the root node, which has no
        parent), so this open method can call its parent's methods, for
        instance to create mappings within the parent's address space.

    close  ( -- )
        Close this previously opened device.

        Restore the device (which has been previously opened) to its
        "not-in-use" state. Typical behavior is to turn off the device, unmap
        it, and deallocate any resources that were allocated by open.

        Note: When closing an instance chain, a particular instance's close
        method is executed before its parents instances are closed, so the
        parent's methods can still be used during the execution of close.

    decode-unit  ( addr len -- lun.lo lun.hi sas.low sas.hi )
        Convert text representation of address to numerical representation.
	
	The text representation can be of the SASAddress form, the phy
	number form, or the SATA identifier form.

        Convert unit-string (addr, len), the text string representation, to
        lun.dbl SASAddress.dbl, the numerical representation of a physical
        address within the address space defined by this device node.  If the
        conversion from phy number to SASAddress fails because there is no
        target device directly connected to that phy, decode-unit should throw
        an exception.

        decode-unit is a static method.

    encode-unit  ( lun.lo lun.hi sas.low sas.hi -- addr len )
        Convert numerical representation of address to text representation.

	The text representation this method creates will always going to be of
	the SASAddress form.

        Convert the ( lun.dbl,SASAddress.dbl ) numerical representation to
        unit-string ( addr,len ) textual representation of the address within
        the address space defined by this device node.

        encode-unit is a static method.

    dma-alloc ( size -- virt )
        Allocate a memory region for later use.

        Allocate 'size' bytes of memory, contiguous within the
        direct-memory-access address space of the device bus, suitable for
        direct memory access by a "bus master" device.  Return the virtual
        address 'virt'. That virtual address is suitable for CPU access to the
        allocated region, but, in general, dma-map-in must be used to convert
        it to an address suitable for direct memory access by the bus-master
        device.

        Allocate the memory according to the most stringent alignment
        requirements for the bus.

        See also: dma-map-in, dma-free

        If the requested operation cannot be performed, a throw shall be
        called with an appropriate error message, as with abort".

        NOTE: Out-of-memory conditions may be detected and handled properly in
        the code with ['] dma-alloc catch.

        Allocate a memory region for later use.

    dma-free ( virt size -- )
        Free memory allocated with dma-alloc.

        Free 'size' bytes of memory at virtual address 'virt', previously
        allocated by the dma-alloc method.

    dma-map-in      ( virt size cacheable? -- devaddr )
        Convert virtual address to device bus DMA address.

        Convert the virtual address range <virt, size>, previously allocated
        by the dma-alloc method, into an address suitable for DMA on the
        device bus. Return this address 'devaddr'.

        dma-map-in can also be used to map application-supplied data buffers
        for DMA use, if possible on the bus.

        If the flag cacheable?, is nonzero, the caller wishes to make use of
        caches for the DMA buffer if they are available.

        Immediately after dma-map-in has been executed, the contents of the
        address range as seen by the processor (the processor's "view") is the
        same as the contents as seen by the device that performs the DMA (the
        device's "view"). After the DMA device has performed DMA or the
        processor has performed a write to the range in question, the contents
        of the address range as seen by the processor (the processor's "view")
        is not necessarily the same as the contents as seen by the device that
        performs the DMA (the device's "view"). The two views can be made
        consistent by executing dma-map-out.

        If the requested operation cannot be performed, a throw shall be
        called with an appropriate error message, as with abort".

        NOTE: Out-of-memory conditions may be detected and handled properly in
        the code with ['] dma-map-in catch.

dma-map-out     ( virt devaddr size -- )
        Free DMA mapping set up with dma-map-in.

        Free the DMA mapping specified by <virt devaddr size>, previously
        created with the dma-map-in method.

        This will also have the effect of flushing all caches associated with
        that mapping.

4.2.2  Bus-specific Methods for Bus Nodes

A package implementing the "scsi-sas" device type may implement the following
optional bus-specific method:

    max-transfer    ( -- max-len )
        Return size of largest possible transfer.

        Return the size (max-len) in bytes of the largest single transfer that
        this device can perform, rounded down to a multiple of block-size.

    set-address ( lun.lo lun.hi sas.lo sas.hi -- )
        Set the SCSI target port SASAddress and logical unit to which
        subsequent commands apply.

        NOTE: phynum will get converted to SASAddress.

    set-timeout ( msecs -- )
        Sets the maximum length of time in milliseconds that the driver will
        wait for completion of a command.  The default value of zero means to
        wait indefinitely.  A hardware error result is reported for a command
        that times out.

    show-children ( -- )
        Searches the SAS for attached targets and their associated
        logical units.  Displays the information that the SCSI inquiry reports
        for those devices.

    diagnose ( -- error-code | 0 )

        Performs a simple self-test for generic SAS target device.

        Perform an SCSI "test-unit-ready" command on the currently selected
        target and unit (see set-address).  If that fails, display a message
        indicating the details of the failure and return a non-zero error
        code.
      
        Otherwise, perform a SCSI "send-diagnostic" command, returning zero if
        it succeeds or a non-zero error code if it fails.

    show-sas-wwid  ( -- )

        The word show-sas-wwid prints the 64-bit base SASAddress of
        the SAS controller in human-readable format.  The precise
        format of the display is left to the implementation [4].

    execute-command ( buf-addr buf-len dir cmd-addr cmd-len --
                      hw-err? | statbyte 0 )
    
        Executes the SCSI command, which is stored in memory at cmd-addr 
        and whose length is cmd-len. Dir is true if the data transfer phase
        of the SCSI command will transfer data from the device to memory, 
        and false otherwise. buf-addr is the address of the memory buffer 
        to be used for the data transfer phase, and buf-len is the expected 
        maximum length of the data transfer phase. The memory buffer must be 
        contained within a DMA-accessible region that was returned by a
        previous execution of dma-alloc. If buf-len is zero, indicating that 
        the command is not expected to have a data transfer phase, both
        buf-addr and dir are ignored. Hw-err?, the returned hardware error
        status, is nonzero if the command could not be executed at all
        (perhaps due to the device not responding to the selection attempt). 
        If hw-err? is zero, statbyte is the status byte returned by the 
        status phase of the command.

    retry-command ( buf-addr buf-len dir cmd-addr cmd-len #retries -- 
                    0 | hw-err? stat | sensebuf 0 stat )
 
        Executes a SCSI command, automatically retrying under certain 
        conditions. retry-command is similar to execute-command except that
        retry-command automatically retries under certain failure conditions 
        and automatically executes the "request sense" SCSI command as 
        necessary.  #retries is the maximum number of times that the command 
        will be retried; if #retries is -1, the command will be retried 
        indefinitely.  retry-command returns 0 if the command eventually 
        succeeds. Otherwise, it returns the status byte returned by the last 
        attempted command on top of the stack (-1 if the command failed due 
        to a hardware error). The  second number on the stack (hw-err?) 
        indicates whether or not the extended sense information is available.
        If hw-err? is zero, the third number on the stack (sensebuf) is the 
        address of a memory buffer containing the extended sense information 
        returned by the "request sense" command that was executed after the 
        last attempt to execute the desired command. The criteria for whether 
        or not to retry the command are as follows:
     
          a) If the requested number of retries have already been performed, 
             do not retry.
          b) If the failure is due to a hardware error, do not retry.
          c) If the failure was due to a "device busy" condition reported in 
             the status byte, retry.
          d) Otherwise, execute the "get extended status" command and attempt 
             to determine whether or not the failure could be retried based 
             on the data in the returned sense buffer, as follows:

            1) Unknown error class (not 7) is not retryable.
            2) Filemark is not retryable.
            3) End of media is not retryable.
            4) Illegal length indicator is not retryable.
            5) sense key = No Sense is retryable.
            6) sense key = Recoverable error is retryable.
            7) sense key = Not Ready is retryable.
            8) sense key = Unit Attention is retryable.
            9) Transaction aborted due to Incoming SCSI Bus reset is retryable
            10) Otherwise, the error is not retryable.

    no-data-command ( cmd-addr -- error? )

        Executes a simple SCSI command, automatically retrying under certain 
        conditions.  cmd-addr is the address of a 6-byte command buffer 
        containing an SCSI command that does not have a data transfer
        phase. Executes the command, retrying indefinitely with the same 
        retry criteria as retry-command.  error? is nonzero if an error 
        occurred, zero otherwise.

        NOTE: no-data-command is a convenience function. It provides no 
        capabilities that are not present in retry-command, but for those 
        commands that meet its restrictions, it is easier to use.

    short-data-command ( data-len cmd-addr cmd-len -- error? | data-adr 0 )

        Executes a simple SCSI command, automatically retrying under certain 
        conditions.  cmd-addr is the address and cmd-len the length of a
        command buffer containing an SCSI command whose data transfer
        phase is expected to transfer less than 256 bytes in an incoming 
        direction. data-len is the expected length (1..255) of the data
        transfer. Executes the command, retrying indefinitely with the same 
        retry criteria as retry-command.  error? is nonzero if an error 
        occurred, zero otherwise. If error? is zero, data-adr is the address 
        of a buffer containing the data transferred by the execution of the 
        command.

        NOTE: short-data-command is a convenience function, eliminating the 
        need for allocating a DMA buffer. It is primarily intended for use 
        with "informational" SCSI commands like "read block limits" and
        "inquiry".


5  Child Nodes Properties and Methods

Child nodes shall implement the standard Open Firmware properties
corresponding to the device type.  The child nodes of SAS controllers do not
have any "reg" property.  SAS controllers support the attachment of many
different types of devices specified by the SCSI [7] and SAS [8] standards.
Open Firmware will only generate child nodes for disks and CDROM/DVD drives.
CDROM/DVD drives will be categorized as disk devices.

5.1     Disk Devices

5.1.1   PROPERTIES 

"name"
        Type:  Prop-encoded-string
        Value: "disk"

"device_type"
        Type:  Prop-encoded-string
        Value: "block"

"compatible"
        Type: Prop-encoded-string
        Value: "sd"


5.1.2  Methods

Child nodes shall implement the standard Open Firmware methods as modified by
the Open Firmware Recommended Practice, Device Support Extensions [1].
Devices that can be used as boot devices share be of type "block" and shall
define the following methods:

The following methods are required by IEEE 1275 [1] to use the "disk-label"
package:

    open ( -- okay? )
        Prepare this device for subsequent use.

	Typical behavior is to allocate any special resource requirements it
        needs, map the device into virtual address space, initialize the
        device and perform a brief "sanity test" to ensure that the device
        appears to be working correctly.

        Return true if this open method was successful, false if not.

	When a device's open method is called, that device's parent has
	already been opened (and so on, up to the root node, which has no
	parent), so this open method can call its parent's methods, for
	instance to create mappings within the parent's address space.

    close ( -- )
        Close this previously opened device

	Restore the device (which has been previously opened) to its 
	"not-in-use" state. Typical behavior is to turn off the device,
	unmap it, and deallocate any resources that were allocated by open.

	Note: When closing an instance chain, a particular instance's close
	method is executed before its parents instances are closed, so the 
	parent's methods can still be used during the execution of close.

    load ( addr -- len )
	Load a client program from device to memory.
	
	Load a client program from the device into memory beginning at address
	addr, returning len, the size in bytes of the program that was loaded.

	If the device can contain several such programs, the
        instance-arguments (as returned by my-args) can be used in a
        device-dependent manner to select the particular program.

	Usage Restriction: The package containing the load method must be open 
	before the load method is executed.

    offset ( d.rel -- d.abs ) 
	Convert partition-relative disk position to absolute position.

	This is a method of the disk label support package. d.rel is a double-
	number disk position, expressed as the number of bytes from the
        beginning of the partition that was specified in the arguments when
        the support package was opened. d.abs is the corresponding 
        double-number disk position, expressed as the number of bytes from
        the beginning of the disk. If no partition was specified when the
        support package was opened, a system-dependent default partition is
        used. If the disk label support package does not support disk
        partitioning, d.abs is equal to d.rel.


The following methods are required by IEEE 1275 [1] to use the "deblocker"
package:

    read ( addr len -- actual )
        Read device into memory buffer; return actual byte count.

        Read at most len bytes from the device into the memory buffer
        beginning at addr. Return actual, the number of bytes actually
        read. If actual is zero or negative, the read operation did not
        succeed.
        Some standard device types impose additional requirements on their
        read methods; see the descriptions of various device types
        (e.g., "network" ) for more information.
        For some devices, the seek method sets the position for the next read.

    write ( addr len -- actual )
        Write memory buffer to device; return actual byte count.

        Write len bytes to the device from the memory buffer beginning at 
        addr. Return actual, the number of bytes actually written. If actual
        is less than len, the write did not succeed.
        For some devices, the seek method sets the position for the next write.

    seek ( pos.lo pos.hi -- status )
        Set device position for next read or write.

        Set the device position at which the next read or write will take
        place. The position is specified by a pair of numbers pos.lo pos.hi,
        whose interpretation depends on the device type. Return -1 if the
        operation fails and either zero or one if it succeeds.

        NOTE- The return value one (1) is meant as a concession to existing
        practice. Programs that use the seek method should treat either of
        the status values 0 or 1 as an indication of success.

    block-size ( -- block-len )
	Return "granularity" for accesses to this device.

	Return the "granularity" in bytes for accesses to this device.
        Perform all transfers to the device in multiples of this size. A
        returned value of 1 signifies that arbitrary transfer sizes are
        support (up to the maximum specified by max-transfer).

    max-transfer ( -- max-len )
	Return size of largest possible transfer.

	Return the size in bytes of the largest single transfer that this
        device can perform, rounded down to a multiple of block-size.

    read-blocks ( addr block# #blocks -- #read )
	Read #blocks, starting at block#, from device into memory.

	Read #blocks records of length block-size bytes from the device
        (starting at device block block#) into memory (starting at addr).
        Return #read, the number of blocks actually read.
	If the device is not capable of random access (e.g., a sequential
        access tape device), block# is ignored.

    write-blocks ( addr block# #blocks -- #written )
	Write #blocks from memory into device, starting at block#.

	Write #blocks records of length block-size bytes from memory
        (starting at addr) to the device (starting at device block block#).
        Return #written, the number of blocks actually written.
	If the device is not capable of random access (e.g., a sequential
        access tape device), block# is ignored.


The following methods were defined in a proposal titled: "Additional
requirements for SCSI devices":

The "disk-label" standard support package and packages of device
type "block" and "byte" shall implement the following method:

    size ( -- d )
        Return as a double number "d", the number of bytes of storage
        associated with the device or instance.  If the size cannot be
        determined, return the double number -1.


Packages of device type "block" and "byte" shall implement the
following method:

    #blocks ( -- u )
        Return as an unsigned number "u", the nmber of blocks of
        storage associated with the device or instance, where "block" is a
        unit of storage consisting of the number of bytes returned by the
        package's "block-size" method.  If the size cannot be determined, or
        if the number of blocks exceeds the range of an unsigned number,
        return the maximum unsigned interger (which because of the Open
        Firmware's assumption of two's complement arthmetic is equivalent to
        the signed number -1).

The "disk-label" standard support package and packages of device
type "block" shall implement the following methods:

    offset-low ( -- u )
        Return the least significant cell of the double number denoting the
        beginning offset of the disk partition that was specified when the
        "disk-label" support package was opened.  In general, the offset is
        obtained by executing the offset method of the "disk-label" support
        package with an argument of zero.  It is permissible for the disk
        package to execute the "disk-label" support package's offset method
        once after opening that support package, storing for later use.

    offset-high ( -- u )
        Return the most significant cell of the double number denoting the
        beginning offset of the disk partition that was specified when the
        "disk-label" support package was opened.  In general, the offset is
        obtained by executing the offset method of the "disk-label" support
        package with an argument of zero.  It is permissible for the disk 
        package to execute the "disk-label" support package's offset method
        once after opening that support package, storing for later use.