Find out about object-based storage device (OSD) support in the Solaris OS. Contents
This article focuses on object-based storage device (OSD) support in the Solaris OS. An overview of OSD in general is provided by a previous Sun Developer Network (SDN) article. In the Solaris OS, the original motivation to support OSD was to build a petascale high-performance computing (HPC) system for the Defense Advanced Research Projects Agency (DARPA). The High Productivity Computing Systems (HPCS) solicitation specified a requirement for an object-based file system. Object storage and the driver that understands object semantics are essential to achieving high I/O bandwidth and scalability requirements. Additionally, Sun endorses a standards-based approach, and OSD is an approved T10 standard. For background information about the ANSI T10 SCSI OSD Version 1 command set extensions that include object-based semantics, refer to the project T10/1355-D standards. The Storage Network Industry Association (SNIA) OSD Technical Work Group is working on OSD-2, which is a further extension to this command set under project T10/1729-D. Apart from several clarifications to OSD-1, interesting features in OSD-2 include:
Other enhancements include 64-bit CDB and attribute list alignments, enhanced security, read-past-end-of-object, setting attribute without data buffer, range based flush, and Object storage plays a critical role in storage and access technology for future computing. The data access pattern of modern day computing is closely tied with attributes associated with data, and block-based file systems have a huge burden of having to apply extra software and hardware resources for retrieving data based on its attributes. OSD, being a SCSI-based protocol, provides an excellent level of abstraction between a file system and its storage devices. OSD can be implemented on existing infrastructure without having to deal with the complex problems of data migration to different hardware or a different protocol. High-performance computing (HPC) is becoming a major requirement for most financial and scientific industries. OSD, in combination with emerging transports like iSCSI Extensions for RDMA (iSER), can be of tremendous value to such computing platforms and plays a vital role in the high performance computing space. Before discussing how object-based storage can be supported in the Solaris OS, let us discuss why object-based storage is important and the difference between object storage file systems and block-based-storage (traditional file systems). The motivation behind implementing OSD in the Solaris OS is to provide a standards-based, scalable, high performing file system. The following diagram shows the difference between a traditional file system stack and an object-based file system stack.
Traditional file systems have two basic components:
The file system user component is responsible for:
The file system storage component is responsible for:
On average, 90% of the file system workload is represented by the storage component. Implementing OSD in the Solaris OS means moving the traditional file system storage component to the OSD. Doing so allows:
Traditional block-based drivers ( The basic command set for OSD is simple. (For more information, see the OpenSolaris web page for OSD.) The underlying protocol is SCSI, and with a few changes for object support, it can run on any transport that is supported by the Solaris OS, such as Fibre Channel, SCSI, iSCSI or IB. There are two sections to be discussed in the overall architecture:
The OSD OpenSolaris web page contains the new SCSI OSD ( The following diagram shows how OSD is implemented in the initiator stack.
The There are several transport technologies that could provide connectivity between object file system initiators and OSD targets. Sending OSD commands using iSCSI over a TCP/IP network is a convenient approach, and is similar to using SCSI RDMA Protocol (SRP) or SCSI over Fibre Channel (FCP). iSCSI Extensions for RDMA (iSER) is a powerful transport subsystem that replaces TCP/IP with RDMA operations, and this in combination with OSD allows for high performance and improved scalability for Solaris supported object file systems. The storage devices in the previous diagram could be Solaris target implementations. So a lot of CPU-intensive operations could still be handled by Solaris servers, instead of a low-end OSD target device. The key components needed for Solaris OSD initiator support are as follows:
The OSD OpenSolaris code contains the Solaris iSCSI target code changes to support the OSD command set. The target object QFS changes (object interface with no namespace) are unavailable at this time. The underlying device also needs to understand OSD semantics to process the commands sent to it. The Solaris iSCSI target, in this example, supports the OSD command set. Even though the disk is traditionally a block device, object storage devices are functionally rich and autonomous by definition. A few required features need to be handled in the iSCSI target emulator to make end-to-end OSD functionality work. In the case of physical object storage devices, such implementation is supported in the disk and controller firmware. The following diagram shows where the OSD commands are handled by the Solaris iSCSI target.
Note the following changes:
HPC, as a consumer of Solaris OSD, is the future direction for this technology. Currently, Shared QFS supports large enterprises, grid deployments, and HPC. With Shared QFS and OSD there are distinct advantages in terms of scalability and performance. Storage allocation on the target node increases horizontal scalability of parallel file systems. Traditional block allocation on the metadata server limits scaling. Hence, with OSD-enabled Shared QFS, space allocation moves to the storage node and is done in parallel. The bandwidth scales up as capacity increases. Currently, block-based Shared QFS supports a single Metadata Server (MDS) for storage allocation and naming services. With storage allocation moving down to the storage node in object-based Shared QFS, the primary responsibility of the MDS now is naming services. Though Shared QFS supports a single MDS at this time, plans are in place to support multiple MDS nodes. Archiving support provided by the Storage Archive Manager (SAM) is already supported in the block space. The ultimate goal of Shared QFS with OSD is to support thousands of compute nodes while also supporting a high-performing storage platform that can effectively use the features that provide value for storage and the computing subsystem. Shared QFS as a parallel file system has a major role to play in a multi-node cluster environment. To support high data rates in an HPC deployment, there is lot of dependency even on a single compute node in terms of I/O transfer sizes and transport capabilities. To use the benefits of the object data path to the maximum possible extent, there are some key features that the Solaris OS needs to support. Most important among them are:
For more details, visit the following OpenSolaris web sites. Project Pages Communities Ramana Srikanth is a software engineer in the Solaris Storage Group at Sun Microsystems Inc., developing iSCSI, Fibre Channel, MPxIO and Target device drivers for the NWS and OS/Net consolidations. Ramana has an M.S. from the University of Toledo in the U.S. and has been working for Sun since December 2001 in various test and development roles. | ||||||||||||||||||||
|
| ||||||||||||