This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Solaris Fast Reboot 1.2. Name of Document Author/Supplier: Sherry Q. Moore 1.3. Date of This Document: 4/10/2008 1.4. Name of Major Document Customer(s)/Consumer(s): 1.4.1. The PAC or CPT you expect to review your project: Solaris PAC 1.4.2. The ARC(s) you expect to review your project: PSARC 1.4.3. The Director/VP who is "Sponsoring" this project: William.Franklin@Sun.COM 1.4.4. The name of your business unit: Solaris Core OS 1.5. Email Aliases: 1.5.1. Responsible Manager: Darrin.Johnson@sun.com 1.5.2. Responsible Engineer: Sherry.Moore@sun.com 1.5.3. Marketing Manager: Videhi.Mallela@Sun.COM 1.5.4. Interest List: intel-core@sun.com, amd64-core@sun.com 2. Project Summary 2.1. Project Description: The goal of the Solaris Fast Reboot project is to reboot to a new Solaris image without going through firmware and boot loader, and reduce the time from "reboot..." to Solaris banner to several seconds. 2.2. Risks and Assumptions: The Solaris Fast Reboot project assumes that device drivers can correctly quiesce devices so they do not corrupt the new OS image. In addition, the project assumes that the devices can operate correctly in the new OS environment without going through the reset and initialization performed by the firmware. We will be able to qualify platforms produced by Sun pre-release. 3. Business Summary 3.1. Problem Area: The Solaris boot and reboot path involves the following basic steps: On x86 systems: (Hardware reset) -> BIOS -> grub -> dboot -> kernel On SPARC systems: (Hardware reset) -> POST -> OBP -> dboot -> kernel As the number of CPUs and the amount of memory in computer systems increase, the time they spend in the BIOS/POST phase to test and initialize hardware gets longer. In the next 12 months we expect to see x86 systems with 1TB of memory. Memory initialization alone will likely take over 1/2 hour. It becomes more and more desirable to short circuit the reboot path so that the firmware and boot loaders can be bypassed. To make the project more manageable and still deliver value on their own, the problem described above can be solved in the following phases: Phase I: Fast Reboot on x86 platforms. Phase II: Fast Reboot post panic on x86 platforms. Fast Reboot (normal and post panic) on SPARC platforms. Phase II of the project will likely require some form of memory DR where a minimum amount of clean memory is used to bring up the kernel while the rest of the memory is tested and brought in dynamically. This case delivers phase I. 3.2. Market/Requester: Sun produces large SPARC and x86 systems with many CPUs and large amount of memory. Reducing downtime by eliminating the time spent in the firmware will likely give Solaris competitive advantage in the server market. It will also improve developers' experience thus making Solaris a more attractive development platform. 3.3. Business Justification: System boot/reboot time is considered system down time. On a SunFire 6800 with 8 CPUs and 16GB of memory takes about 15 minutes to boot or reboot, 11 of which is spent in POST (Power-On Self Test). On a dual-socket quad-core Intel Xeon system with 4GB of memory, it takes about 2 minutes to boot or reboot, 90 seconds of which is spent in BIOS. The time spent in firmware will only grow with more CPUs and memory. The less time a system spends in the boot phase, the more useful work it can do. High availability is extremely important to many, if not all, of our customers. Shorter reboot time also reduces the test turnaround time, thus improves developers' productivity. 3.4. Competitive Analysis: Linux has a feature called "kexec" where the running kernel loads a new kernel, and jumps to it directly. To avoid potential litigation, no "kexec" source code has been studied. The observed limitations of "kexec" are - It can only support same mode transition (32-bit -> 32-bit or 64-bit -> 64-bit). - It cannot handle systems where the physical memory address can't fit in an unsigned long, which limits the maximum amount of memory it can support to 4GB. - Many device drivers can't restart correctly. Due to the many limitations, the tool has never been truly adopted. With Solaris we have a clearly defined DDI interface where device driver developers can adhere to so that hardware, Sun qualified or otherwise, can be configured to support fast reboot. 3.5. Opportunity Window/Exposure: Sun is aligning effort with chip vendors and OEMs to position Solaris as Unix of Choice on their platforms. The Fast Reboot project will give our customers another reason to choose Solaris over our competitors. 3.6. How will you know when you are done?: When we are able to fast reboot to a new Solaris image on Sun manufactured x86 platforms within seconds. 4. Technical Description: 4.1. Details: The code introduced by the Fast Reboot project will act as an "in-kernel" boot loader which will load the new kernel and boot archive into designated target locations, then switch to the new OS image. It consists of architecture independent and dependent components. - Architecture independent components 1. Additional "-f" and "-e" flags to the reboot(1M) command. 2. Loading and processing of the new kernel and boot archive. 3. Implementation of device driver quiesce. - Architecture dependent components 1. Obtaining target location of the new kernel and boot archive. 2. Copy and switch to the new kernel. 4.2. Bug/RFE Number(s): 6714038 Fast Reboot support for x86 platforms 4.3. In Scope: 4.4. Out of Scope: 4.5. Interfaces: 4.5.1 New flags to reboot(1M) and uadmin(1M) An additional flag "-f" is added to the reboot(1M) command. When the "-f" flag is specified, the fast reboot code will be executed. For example, # reboot -f -- '/platform/i86pc/mykernel/amd64/unix -k' to boot to new kernel "mykernel", or # reboot -f to fast reboot with the same boot arguments as the previous boot. The boot archive is derived from the kernel argument. In case of any failure in the fast reboot path, such as out of memory, the normal reset path will be taken. To facilitate the "-f" flag, a new uadmin function AD_FASTREBOOT is added to the current function list. It is recognized by commands using these function numbers. For example, # uadmin 2 8 will reset the system using the current boot arguments via the fast reboot path. Additionally, an alternate UFS boot disk or ZFS root pool can be specified for fast reboot: # reboot -f -- \ '/dev/dsk/c0t0s3 /platform/i86pc/mykernel/amd64/unix -k' # reboot -f -- \ 'rootpool/root1 /platform/i86pc/mykernel/amd64/unix -k' or if the disk has been mounted, such as when bfu'ing an alternate root: # reboot -f -- '/mnt/platform/i86pc/mykernel/amd64/unix -k' When "-f" flag is specified, user can optionally use the "-e" flag to specify an alternate boot environment: # reboot -f -e second_be_name The "-e" option has dependencies on the live upgrade packages, in particular the "lumount" and "luumount" commands. 4.5.2 New DDI quiesce(9E) function On the reboot path prior to pulling hardware reset, a new function quiesce_devices() will be called to invoke the devo_quiesce() DDI entry point to quiesce devices. The fast reboot project relies on the drivers correctly implementing the DDI devo_quiesce() function. All non-pseudo drivers, ie, drivers with no associated hardware, must support devo_quiesce(). If a device does not generate interrupts or do DMA, its driver can set the devo_quiesce() entry point to nulldev. If there exist non-pseudo drivers with devo_quiesce() being NULL or set to nodev, fast reboot will fall back to regular reboot. The semantics of devo_quiesce() are: 1. No more interrupts will be generated post quiesce(); 2. No more memory modification or access post quiesce(); 3. Device can be correctly configured by the driver's attach() routine without going through a system power cycle or BIOS reset and initialization. 4. Devices come with factory default settings must restore such settings in quiesce() to achieve a state as close to "power-on" as possible. On devices such as HBAs, this "power-on" state must not require spin-up and reprobe all internal drives. See attached quiesce.man.txt for details. Implementations of quiesce(9E) should not fail as drivers should be able to shut off DMA and interrupts on the devices. We will ASSERT quiesce(9E) returning DDI_SUCCESS on debug kernels to validate that newly introduced drivers or modifications to drivers will remain Fast Reboot compliant. On non-debug kernels, if quiesce(9E) returns DDI_FAILURE, regular reboot path will be taken. 4.6. Doc Impact: 4.6.1 Man pages for reboot(1M) The man pages for reboot(1M) will need to be updated to reflect the new options. Even though additional options are offered for uadmin(1M), the current man pages for uadmin(1M) is very limited and does not list all available options. So there might not be need to update the man pages for uadmin(1M). 4.6.2 Man pages for uadmin(2) The man pages for uadmin(2) needs to be updated to include the two new functions for A_SHUTDOWN: #define AD_FASTREBOOT 8 /* bypass BIOS and boot loader */ #define AD_FASTREBOOT_DRYRUN 9 /* Fast reboot Dry run */ 4.6.3 DDI man pages and documents A new man page for quiesce(9E) will be added. All DDI man pages, in particular dev_ops(9S) and documents will need to updated to reflect the semantics for the devo_quiesce() function. 4.6.4 Device Driver's Guide Device Driver's Guide will need to be updated to document the semantics for the DDI devo_quiesce() function. 4.7. Admin/Config Impact: System administrators will have the option to take advantage of the fast reboot capability by invoking reboot(1M) with the "-f" and "-e" flags. 4.8. HA Impact: None. 4.9. I18N/L10N Impact: Localized man pages and documentation will need to be updated. 4.10. Packaging & Delivery: ON packages. 4.11. Security Impact: None. 4.12. Dependencies: 4.12.1. Drivers implementing the DDI devo_quiesce() interface If there exist drivers with that do DMA or generate interrupts that don't support the DDI devo_quiesce() entry point, fast reboot will fall back to regular reboot. For fast reboot to work in HVM domains, PV drivers must implement devo_quiesce() as well. 5. Reference Documents: The SPARC Architecture Manual Intel 64 and IA-32 Architectures Software Developer's Manual PSARC 2006/525 New Solaris SPARC Boot Architecture PSARC 2004/454 Solaris Boot Architecture Design doc: http://jurassic-x4600/~sherrym/projects/fastboot/fastboot.txt Man pages: http://jurassic-x4600/~sherrym/projects/fastboot/reboot.man.txt http://jurassic-x4600/~sherrym/projects/fastboot/uadmin-2.man.txt http://jurassic-x4600/~sherrym/projects/fastboot/quiesce.man.txt http://jurassic-x4600/~sherrym/projects/fastboot/dev_ops.man.txt 6. Resources and Schedule: 6.1. Projected Availability: Q2CY08 6.2. Cost of Effort: 2-4 person 6 months (number of people requirement may vary due to their familiarity with the architecture). 6.3. Cost of Capital Resources: Need access to platforms that we want to support Fast Reboot on. If we are to support a platform that we don't already have, such capital will need to be acquired, and could result in additional capital expense. 6.4. Product Approval Committee requested information: 6.4.1. Consolidation or Component Name: OsNet 6.4.3. Type of CPT Review and Approval expected: FastTrack 6.4.4. Project Boundary Conditions: We will only qualify Sun manufactured platforms with the delivery for this case. Fast reboot will not be supported in the following environments with the delivery for this case. - xVM Solaris dom0 domains - xVM Solaris PV domU domains - non-global zones 6.4.5. Is this a necessary project for OEM agreements: No 6.4.6. Notes: None 6.4.7. Target RTI Date/Release: 6/9/2008 onnv 6.4.8. Target Code Design Review Date: 5/19/2008 6.4.9. Update approval addition: None. 6.5. ARC review type: FastTrack 7. Prototype Availability: 7.1. Prototype Availability: Prototype is available (Q1CY08). 7.2. Prototype Cost: 1 person 6 months