SCSI SAS 1275 Binding Title: Version 1.0 1 Introduction 1.1 Overview and References This document describes the application of Open Firmware to the SCSI-3 protocol as implemented on Serial Attached SCSI (SAS). 2 References and Definitions 2.1 References [1] IEEE Standard 1275-1994 IEEE Standard for Boot (Initialization and Configuration) Firmware: Core Requirements and Practices [2] Device Support Extensions to IEEE 1274-1994, Revision 1.0 [3] FWARC 2005/751, SAS WWID determined from system MAC address [4] FWARC 2006/035, LSI SAS Controller Methods for Manufacturing and Service [5] PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot Firmware Rev. 2.1 http://noho.eng/1275/bindings/pci/pci2_1.pdf [6] FWARC 2003/637, PCI Express Bus Binding to IEEE 1275 [7] SCSI Architecture Model-4 http://t10.org/ftp/t10/drafts/sam4/sam4r13.pdf [8] Serial Attached SCSI 1.1 (SAS 1.1) http://t10.org/ftp/t10/drafts/sas1/sas1r10.pdf [9] SCSI ATA Translation http://t10.org/ftp/t10/drafts/sat/sat-r09.pdf [10] SCSI-3 Parallel Bindings http://noho.eng/1275/practice/spi/spi1_0.ps 2.2 Definition of Terms bus node: an Open Firmware device node that represents a bus controller. In cases where a node represents the interface, or "bridge", between one bus and another, the node is both a bus node relative to the bus it controls and a child node of its parent bus. Note that an Open Firmware device node is not in itself a physical hardware device, rather it is a software abstraction that describes a hardware device. logical unit: A target resident entity that implements a device model and executes SCSI commands sent by an application client. LUN: Logical unit number. phy: The part of a device used to connect to other devices. One end of a point-to-point SAS link. port: An entity at one end of a SCSI-3 initiator/target nexus. target port: The port of a SCSI-3 target device. SASAddress: 64-bit globally unique identifier used for addressing over SAS fabrics. Its format is defined by the SAS [8] standard. SAS expander: See SAS switch. SAS switch: A device that extends the SAS fabric and allows a single SAS phy to communicate with multiple target devices. SAS switches may be cascaded. 3 Bus Charactistics 3.1 Physical Address Formats and Representations 3.1.1 Physical Address Formats 3.1.1.1 Numerical Representation The numerical representation of an address for a "scsi-sas" device type consists of four cells encoded as follows. Bit #0 refers to the least significant bit. Bit # 33222222 22221111 11111100 00000000 10987654 32109876 54321098 76543210 sas.hi cell: ssssssss ssssssss ssssssss ssssssss sas.lo cell: ssssssss ssssssss ssssssss ssssssss lun.hi cell: llllllll llllllll llllllll llllllll lun.lo cell: llllllll llllllll llllllll llllllll where: ss..ss 64-bit unsigned number SASAddress ll..ll 64-bit unsigned number logical unit 3.1.1.2 Text Representation SAS controllers support three textual address representations. The canonical text representation of the address is of the SASAddress form, and specifies the SASAddress of the device's target SAS port. The encode-unit will only generate the SASAddress form, never the other forms (the phy number form or the SATA identity form). SATA devices in a SAS fabric do not have a permanently assigned SAS address. If the fabric is reconfigured and results in SATA devices being renamed, the boot path may point to the wrong device. To mitigate this problem, SATA devices may be addressed using a synthetic globally unique SATA identity form. This form is only for SATA devices and will never be used to address a SAS device. SAS controllers support a third optional alternative addressing form in the decode-unit method in the form of the phy number. This form specifies the number of the phy used to communicate to the child device. This form is only valid for devices that are directly attached to the SAS controller, and can be used for both SAS and SATA devices. If a SAS expander is connected to a phy, multiple target devices may be accessible through that one phy and its use is ambiguous. If opening a device specifies a phy address and there is a SAS expander connected to that phy the open request should fail. 3.1.1.2.1 SASAddress Representation The text representation of a SASAddress is of the following form: wNNNNNNNNNNNNNNN[,LLLLLLLLLLLLLLLL] where: w is the letter 'w' NNNNNNNNNNNNNNNN is an ASCII hexadecimal number in the range 0...FFFFFFFFFFFFFFFF that specifies the SASAddress of the device's target SAS port. LLLLLLLLLLLLLLLL is an ASCII hexadecimal number in the range 0...FFFFFFFFFFFFFFFF specifying a LUN. This portion of the address is optional and may be omitted if zero. Conversion of hexadecimal numbers from text representation to numerical representation shall be case-insensitive and leading zeros shall be permitted but not required. Conversion from numerical representation to text representation shall use the lower case forms of the hexadecimal digits in the range a...f, suppresing leading zeros. The correspondence between the text representation and numerical representation is as follows: wNNNNNNNNNNNNNNN,LLLLLLLLLLLLLLLL corresponds to a Node name with numerical value: ss...ss is a binary encoding of NNNNNNNNNNNNNNN ll...ll is a binary encoding of LLLLLLLLLLLLLLLL 3.1.1.2.2 Phy Number Representation The text representation of a phy number address is of the following form: PP[,LLLLLLLLLLLLLLLL] where: PP is an ASCII hexadecimal number in the range of 0...FF specifying the host adapter's phy number. LLLLLLLLLLLLLLLL is an ASCII hexadecimal number in the range of 0...FFFFFFFFFFFFFFFF specifying a LUN. This portion of the address is optional and may be omitted if zero. Conversion of hexadecimal numbers from text representation to numerical representation shall be case-insensitive and leading zeros shall be permitted but not required. The phy number is internally converted to the SASAddress of the directly attached device when generating the unit address. If a SAS expander is directly attached to that phy, this operation will fail, and decode-unit should throw an exception. Note: There is no numerical representation of the phy number format. 3.1.1.2.3 SATA Identity Representation SATA drives on a SAS fabric have a SASAddress assigned to them by the device they are connected to. Unlike SAS disks which have a SASAddress permanently associated with them, this address can change for a SATA disk if it is ever relocated within the fabric. To provide similar behavior to SAS disks, a special addressing form is provided for SATA disks. SATA disks can be addressed by the SCSI VPD information available from the SCSI inquiry page 83 logical unit name (data bytes 4 through 11 specified in SAT[9] section 10.3.4). The text representation of a SATA identity is of the following form: sSSSSSSSSSSSSSSSS[,LLLLLLLLLLLLLLLL] where: SSSSSSSSSSSSSSSS is an ASCII hexadecimal number in the range of 0...FFFFFFFFFFFFFFFF specifying the device's INQUIRY page 83 logical unit name (data bytes 4-11). LLLLLLLLLLLLLLLL is an ASCII hexadecimal number in the range of 0...FFFFFFFFFFFFFFFF specifying a LUN. This portion of the address is optional and may be omitted if zero. Conversion of hexadecimal numbers from text representation to numerical representation shall be case-insensitive and leading zeros shall be permitted but not required. The SAS adapter driver converts the SATA identifier to the current SASAddress of the SATA device and uses that for the device's unit address. Note: There is no numerical representation of the SATA identity format. 4 Bus Nodes 4.1 Properties Since a SAS controller is not a root nexus and can be attached to many different bus types, the controller node needs to provide any properties or methods defined by its parent bus node. This document defines properties and methods that are specific to SAS controller nodes and their children. 4.1.1 Open Firmware-defined Properties for Bus Nodes The following standard properties, as defined in Open Firmware [1], have special meaning or interpretation for SAS: "name" Type: Prop-encoded-string Value: "scsi" "device_type" Type: Prop-encoded-string Value: "scsi-sas" "#address-cells" Type: Prop-encoded-integer Contents: Standard property name to define package's address format Value: 4 "compatible" Type: Prop-encoded-array Contents: This is defined by bus or device specific bindings. Typically this would be the IEEE 1275 PCI Bus Bindings [5]. 4.2 Methods 4.2.1 Open Firmware-defined Methods for Bus Nodes A Standard Package implementing the "scsi-sas" device type shall implement the following standard methods as defined in Open Firmware [1], with physical address representations as specified in 3.1 of this standard: open ( -- okay? ) Prepare this device for subsequent use. Typical behavior is to allocate any special resource requirements it needs, map the device into virtual address space, initialize the device and perform a brief "sanity test" to ensure that the device appears to be working correctly. Return true if this open method was successful, false if not. When a device's open method is called, that device's parent has already been opened (and so on, up to the root node, which has no parent), so this open method can call its parent's methods, for instance to create mappings within the parent's address space. close ( -- ) Close this previously opened device. Restore the device (which has been previously opened) to its "not-in-use" state. Typical behavior is to turn off the device, unmap it, and deallocate any resources that were allocated by open. Note: When closing an instance chain, a particular instance's close method is executed before its parents instances are closed, so the parent's methods can still be used during the execution of close. decode-unit ( addr len -- lun.lo lun.hi sas.low sas.hi ) Convert text representation of address to numerical representation. The text representation can be of the SASAddress form, the phy number form, or the SATA identifier form. Convert unit-string (addr, len), the text string representation, to lun.dbl SASAddress.dbl, the numerical representation of a physical address within the address space defined by this device node. If the conversion from phy number to SASAddress fails because there is no target device directly connected to that phy, decode-unit should throw an exception. decode-unit is a static method. encode-unit ( lun.lo lun.hi sas.low sas.hi -- addr len ) Convert numerical representation of address to text representation. The text representation this method creates will always going to be of the SASAddress form. Convert the ( lun.dbl,SASAddress.dbl ) numerical representation to unit-string ( addr,len ) textual representation of the address within the address space defined by this device node. encode-unit is a static method. dma-alloc ( size -- virt ) Allocate a memory region for later use. Allocate 'size' bytes of memory, contiguous within the direct-memory-access address space of the device bus, suitable for direct memory access by a "bus master" device. Return the virtual address 'virt'. That virtual address is suitable for CPU access to the allocated region, but, in general, dma-map-in must be used to convert it to an address suitable for direct memory access by the bus-master device. Allocate the memory according to the most stringent alignment requirements for the bus. See also: dma-map-in, dma-free If the requested operation cannot be performed, a throw shall be called with an appropriate error message, as with abort". NOTE: Out-of-memory conditions may be detected and handled properly in the code with ['] dma-alloc catch. Allocate a memory region for later use. dma-free ( virt size -- ) Free memory allocated with dma-alloc. Free 'size' bytes of memory at virtual address 'virt', previously allocated by the dma-alloc method. dma-map-in ( virt size cacheable? -- devaddr ) Convert virtual address to device bus DMA address. Convert the virtual address range , previously allocated by the dma-alloc method, into an address suitable for DMA on the device bus. Return this address 'devaddr'. dma-map-in can also be used to map application-supplied data buffers for DMA use, if possible on the bus. If the flag cacheable?, is nonzero, the caller wishes to make use of caches for the DMA buffer if they are available. Immediately after dma-map-in has been executed, the contents of the address range as seen by the processor (the processor's "view") is the same as the contents as seen by the device that performs the DMA (the device's "view"). After the DMA device has performed DMA or the processor has performed a write to the range in question, the contents of the address range as seen by the processor (the processor's "view") is not necessarily the same as the contents as seen by the device that performs the DMA (the device's "view"). The two views can be made consistent by executing dma-map-out. If the requested operation cannot be performed, a throw shall be called with an appropriate error message, as with abort". NOTE: Out-of-memory conditions may be detected and handled properly in the code with ['] dma-map-in catch. dma-map-out ( virt devaddr size -- ) Free DMA mapping set up with dma-map-in. Free the DMA mapping specified by , previously created with the dma-map-in method. This will also have the effect of flushing all caches associated with that mapping. 4.2.2 Bus-specific Methods for Bus Nodes A package implementing the "scsi-sas" device type may implement the following optional bus-specific method: max-transfer ( -- max-len ) Return size of largest possible transfer. Return the size (max-len) in bytes of the largest single transfer that this device can perform, rounded down to a multiple of block-size. set-address ( lun.lo lun.hi sas.lo sas.hi -- ) Set the SCSI target port SASAddress and logical unit to which subsequent commands apply. NOTE: phynum will get converted to SASAddress. set-timeout ( msecs -- ) Sets the maximum length of time in milliseconds that the driver will wait for completion of a command. The default value of zero means to wait indefinitely. A hardware error result is reported for a command that times out. show-children ( -- ) Searches the SAS for attached targets and their associated logical units. Displays the information that the SCSI inquiry reports for those devices. diagnose ( -- error-code | 0 ) Performs a simple self-test for generic SAS target device. Perform an SCSI "test-unit-ready" command on the currently selected target and unit (see set-address). If that fails, display a message indicating the details of the failure and return a non-zero error code. Otherwise, perform a SCSI "send-diagnostic" command, returning zero if it succeeds or a non-zero error code if it fails. show-sas-wwid ( -- ) The word show-sas-wwid prints the 64-bit base SASAddress of the SAS controller in human-readable format. The precise format of the display is left to the implementation [4]. execute-command ( buf-addr buf-len dir cmd-addr cmd-len -- hw-err? | statbyte 0 ) Executes the SCSI command, which is stored in memory at cmd-addr and whose length is cmd-len. Dir is true if the data transfer phase of the SCSI command will transfer data from the device to memory, and false otherwise. buf-addr is the address of the memory buffer to be used for the data transfer phase, and buf-len is the expected maximum length of the data transfer phase. The memory buffer must be contained within a DMA-accessible region that was returned by a previous execution of dma-alloc. If buf-len is zero, indicating that the command is not expected to have a data transfer phase, both buf-addr and dir are ignored. Hw-err?, the returned hardware error status, is nonzero if the command could not be executed at all (perhaps due to the device not responding to the selection attempt). If hw-err? is zero, statbyte is the status byte returned by the status phase of the command. retry-command ( buf-addr buf-len dir cmd-addr cmd-len #retries -- 0 | hw-err? stat | sensebuf 0 stat ) Executes a SCSI command, automatically retrying under certain conditions. retry-command is similar to execute-command except that retry-command automatically retries under certain failure conditions and automatically executes the "request sense" SCSI command as necessary. #retries is the maximum number of times that the command will be retried; if #retries is -1, the command will be retried indefinitely. retry-command returns 0 if the command eventually succeeds. Otherwise, it returns the status byte returned by the last attempted command on top of the stack (-1 if the command failed due to a hardware error). The second number on the stack (hw-err?) indicates whether or not the extended sense information is available. If hw-err? is zero, the third number on the stack (sensebuf) is the address of a memory buffer containing the extended sense information returned by the "request sense" command that was executed after the last attempt to execute the desired command. The criteria for whether or not to retry the command are as follows: a) If the requested number of retries have already been performed, do not retry. b) If the failure is due to a hardware error, do not retry. c) If the failure was due to a "device busy" condition reported in the status byte, retry. d) Otherwise, execute the "get extended status" command and attempt to determine whether or not the failure could be retried based on the data in the returned sense buffer, as follows: 1) Unknown error class (not 7) is not retryable. 2) Filemark is not retryable. 3) End of media is not retryable. 4) Illegal length indicator is not retryable. 5) sense key = No Sense is retryable. 6) sense key = Recoverable error is retryable. 7) sense key = Not Ready is retryable. 8) sense key = Unit Attention is retryable. 9) Transaction aborted due to Incoming SCSI Bus reset is retryable 10) Otherwise, the error is not retryable. no-data-command ( cmd-addr -- error? ) Executes a simple SCSI command, automatically retrying under certain conditions. cmd-addr is the address of a 6-byte command buffer containing an SCSI command that does not have a data transfer phase. Executes the command, retrying indefinitely with the same retry criteria as retry-command. error? is nonzero if an error occurred, zero otherwise. NOTE: no-data-command is a convenience function. It provides no capabilities that are not present in retry-command, but for those commands that meet its restrictions, it is easier to use. short-data-command ( data-len cmd-addr cmd-len -- error? | data-adr 0 ) Executes a simple SCSI command, automatically retrying under certain conditions. cmd-addr is the address and cmd-len the length of a command buffer containing an SCSI command whose data transfer phase is expected to transfer less than 256 bytes in an incoming direction. data-len is the expected length (1..255) of the data transfer. Executes the command, retrying indefinitely with the same retry criteria as retry-command. error? is nonzero if an error occurred, zero otherwise. If error? is zero, data-adr is the address of a buffer containing the data transferred by the execution of the command. NOTE: short-data-command is a convenience function, eliminating the need for allocating a DMA buffer. It is primarily intended for use with "informational" SCSI commands like "read block limits" and "inquiry". 5 Child Nodes Properties and Methods Child nodes shall implement the standard Open Firmware properties corresponding to the device type. The child nodes of SAS controllers do not have any "reg" property. SAS controllers support the attachment of many different types of devices specified by the SCSI [7] and SAS [8] standards. Open Firmware will only generate child nodes for disks and CDROM/DVD drives. CDROM/DVD drives will be categorized as disk devices. 5.1 Disk Devices 5.1.1 PROPERTIES "name" Type: Prop-encoded-string Value: "disk" "device_type" Type: Prop-encoded-string Value: "block" "compatible" Type: Prop-encoded-string Value: "sd" 5.1.2 Methods Child nodes shall implement the standard Open Firmware methods as modified by the Open Firmware Recommended Practice, Device Support Extensions [1]. Devices that can be used as boot devices share be of type "block" and shall define the following methods: The following methods are required by IEEE 1275 [1] to use the "disk-label" package: open ( -- okay? ) Prepare this device for subsequent use. Typical behavior is to allocate any special resource requirements it needs, map the device into virtual address space, initialize the device and perform a brief "sanity test" to ensure that the device appears to be working correctly. Return true if this open method was successful, false if not. When a device's open method is called, that device's parent has already been opened (and so on, up to the root node, which has no parent), so this open method can call its parent's methods, for instance to create mappings within the parent's address space. close ( -- ) Close this previously opened device Restore the device (which has been previously opened) to its "not-in-use" state. Typical behavior is to turn off the device, unmap it, and deallocate any resources that were allocated by open. Note: When closing an instance chain, a particular instance's close method is executed before its parents instances are closed, so the parent's methods can still be used during the execution of close. load ( addr -- len ) Load a client program from device to memory. Load a client program from the device into memory beginning at address addr, returning len, the size in bytes of the program that was loaded. If the device can contain several such programs, the instance-arguments (as returned by my-args) can be used in a device-dependent manner to select the particular program. Usage Restriction: The package containing the load method must be open before the load method is executed. offset ( d.rel -- d.abs ) Convert partition-relative disk position to absolute position. This is a method of the disk label support package. d.rel is a double- number disk position, expressed as the number of bytes from the beginning of the partition that was specified in the arguments when the support package was opened. d.abs is the corresponding double-number disk position, expressed as the number of bytes from the beginning of the disk. If no partition was specified when the support package was opened, a system-dependent default partition is used. If the disk label support package does not support disk partitioning, d.abs is equal to d.rel. The following methods are required by IEEE 1275 [1] to use the "deblocker" package: read ( addr len -- actual ) Read device into memory buffer; return actual byte count. Read at most len bytes from the device into the memory buffer beginning at addr. Return actual, the number of bytes actually read. If actual is zero or negative, the read operation did not succeed. Some standard device types impose additional requirements on their read methods; see the descriptions of various device types (e.g., "network" ) for more information. For some devices, the seek method sets the position for the next read. write ( addr len -- actual ) Write memory buffer to device; return actual byte count. Write len bytes to the device from the memory buffer beginning at addr. Return actual, the number of bytes actually written. If actual is less than len, the write did not succeed. For some devices, the seek method sets the position for the next write. seek ( pos.lo pos.hi -- status ) Set device position for next read or write. Set the device position at which the next read or write will take place. The position is specified by a pair of numbers pos.lo pos.hi, whose interpretation depends on the device type. Return -1 if the operation fails and either zero or one if it succeeds. NOTE- The return value one (1) is meant as a concession to existing practice. Programs that use the seek method should treat either of the status values 0 or 1 as an indication of success. block-size ( -- block-len ) Return "granularity" for accesses to this device. Return the "granularity" in bytes for accesses to this device. Perform all transfers to the device in multiples of this size. A returned value of 1 signifies that arbitrary transfer sizes are support (up to the maximum specified by max-transfer). max-transfer ( -- max-len ) Return size of largest possible transfer. Return the size in bytes of the largest single transfer that this device can perform, rounded down to a multiple of block-size. read-blocks ( addr block# #blocks -- #read ) Read #blocks, starting at block#, from device into memory. Read #blocks records of length block-size bytes from the device (starting at device block block#) into memory (starting at addr). Return #read, the number of blocks actually read. If the device is not capable of random access (e.g., a sequential access tape device), block# is ignored. write-blocks ( addr block# #blocks -- #written ) Write #blocks from memory into device, starting at block#. Write #blocks records of length block-size bytes from memory (starting at addr) to the device (starting at device block block#). Return #written, the number of blocks actually written. If the device is not capable of random access (e.g., a sequential access tape device), block# is ignored. The following methods were defined in a proposal titled: "Additional requirements for SCSI devices": The "disk-label" standard support package and packages of device type "block" and "byte" shall implement the following method: size ( -- d ) Return as a double number "d", the number of bytes of storage associated with the device or instance. If the size cannot be determined, return the double number -1. Packages of device type "block" and "byte" shall implement the following method: #blocks ( -- u ) Return as an unsigned number "u", the nmber of blocks of storage associated with the device or instance, where "block" is a unit of storage consisting of the number of bytes returned by the package's "block-size" method. If the size cannot be determined, or if the number of blocks exceeds the range of an unsigned number, return the maximum unsigned interger (which because of the Open Firmware's assumption of two's complement arthmetic is equivalent to the signed number -1). The "disk-label" standard support package and packages of device type "block" shall implement the following methods: offset-low ( -- u ) Return the least significant cell of the double number denoting the beginning offset of the disk partition that was specified when the "disk-label" support package was opened. In general, the offset is obtained by executing the offset method of the "disk-label" support package with an argument of zero. It is permissible for the disk package to execute the "disk-label" support package's offset method once after opening that support package, storing for later use. offset-high ( -- u ) Return the most significant cell of the double number denoting the beginning offset of the disk partition that was specified when the "disk-label" support package was opened. In general, the offset is obtained by executing the offset method of the "disk-label" support package with an argument of zero. It is permissible for the disk package to execute the "disk-label" support package's offset method once after opening that support package, storing for later use.