1. Introduction 1.1. Project/Component Working Name: Solaris Fast Reboot 1.2. Name of Document Author/Supplier: Sherry Q. Moore 1.3. Date of This Document: 5/29/2008 4. Technical Description 4.1 Introduction Solaris has always strived to be the most reliable and available operating system. Many technologies have been invented to achieve this goal, the notable ones include Dynamic Reconfiguration (DR), Fault Management Architecture (FMA), SMF, ZFS, just to name a few. The objectives of all these projects are to keep the systems up and running correctly in the face of unexpected hardware and software failures with as little down time as possible. System boot/reboot time is considered system down time. The less time a system spends in the boot phase, the more useful work it can do. High availability is extremely important to most, if not all, of our customers. Shorter reboot time also reduces the test turnaround time, thus improves developers' productivity. 4.2 Background The Solaris boot and reboot path involves the following basic steps: On x86 systems: (Hardware reset) -> BIOS -> grub -> dboot -> kernel On SPARC systems: (Hardware reset) -> POST -> OBP -> dboot -> kernel On x86 systems, upon startup or reset, the BIOS code performs hardware testing and initialization, then jumps to "grub" the boot loader. Grub loads dboot, unix text, data and the boot archive into memory, then calls dboot. dboot does necessary initialization, such as building the initial page tables, or loading kernel text and data to a different location, then jumps to the kernel. As computer systems become more complex, the time they spend in the BIOS/POST phase to test and initialize hardware gets longer. In the next 12 months we expect to see x86 systems with 1TB of memory. Memory initialization alone will take over 1/2 hour. It becomes more and more desirable to short circuit the reboot path so that the firmware and bootloaders can be bypassed. The fast reboot code will act as an in-kernel boot loader that loads the kernel into memory and switches to it. The new kernel in the context of this write-up includes the dboot that gets tacked on during build time. The goal of the Solaris Fast Reboot project is to get to login prompt from "rebooting..." within seconds (assuming boot archive has been updated). The Solaris implementation will support systems with arbitrarily large amount of memory, and provide flexibility to reboot to 32-bit or 64-bit kernels. 4.3 Interface Table INTERFACE COMMITMENT LEVEL COMMENT reboot -f (1M) Committed To initiate a fast reboot. reboot -e (1M) Committed To fast reboot to a different BE. uadmin(2) Committed Added AD_FASTBOOT and AD_FASTBOOT_DRYRUN to facilitate fast reboot. quiesce(9E) Committed To quiesce a device. ddi_quiesce_not_ Committed Returns DDI_SUCCESS. No need to quiesce. needed(9F) ddi_quiesce_not_\ Sun Private Returns DDI_FAILURE. Quiesce needed supported(9F) but not implemented. dev_ops(9S) Committed Added devo_quiesce ops for quiescing devices. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open A. Man pages A.1 quiesce(9E): new man page A.2 reboot(1M) A.3 uadmin(2) A.4 dev_ops(9S) A.1 Man page for quiesce(9E) Man pages for ddi_quiesce_not_needed(9F) and ddi_quiesce_not_supported(9F) will be links to the man page for quiesce(9E). Driver Entry Points quiesce(9E) NAME quiesce - quiesce a device SYNOPSIS #include #include int prefix quiesce(dev_info_t dip); int ddi_quiesce_not_needed(dev_info_t *dip); int ddi_quiesce_not_supported(dev_info_t *dip); INTERFACE LEVEL Solaris DDI specific (Solaris DDI) PARAMETERS dip A pointer to the device's dev_info structure. DESCRIPTION The quiesce() function quiesces a device so that the device will no longer generate interrupts or modify or access memory. The driver should reset the device to a hardware state from which the device can be correctly configured by the driver's attach() routine without a system power cycle or being configured by the firmware. For devices with a defined reset state configuration, the driver should return that device to that state as part of the quiesce operation. One such use case is Fast Reboot where firmware is bypassed when booting to a new OS image. Quiesce is only called for an attached device instance as one of the final operations of a reboot sequence, and no other thread can be active for this device. The system guarantees that no other driver entry point will be active or invoked quiesce(9E) is invoked. The system also guarantees that no timeout or taskq will be invoked. The system is single-threaded and not preemptable or interrupted, therefore the driver's quiesce() implementation must not use locks or timeouts or rely on them being called. The driver must discard all outstanding I/O instead of waiting for completion. By conclusion of the quiesce() operation, the driver must guarantee that device will not generate further access to memory or interrupts. The only DDI interfaces that can be called by the quiesce() implementation are non-blocking functions, such as ddi_get*(9F) and ddi_put*(9F). If quiesce() determines a particular instance of the device cannot be quiesced when requested because of some exceptional condition, quiesce() must return DDI_FAILURE. This should almost never happen. If a driver has previously implemented the obsolete reset() interface, its functionality must be merged into quiesce(). The driver's reset() routine will no longer be called if an implementation of quiesce() is present. ddi_quiesce_not_needed() always returns DDI_SUCCESS. A driver can set its devo_quiesce device function to ddi_quiesce_not_needed() to indicate that the device it manages does not need to be quiesced. ddi_quiesce_not_supported() always returns DDI_FAILURE. A driver can set its devo_quiesce device function to ddi_quiesce_not_supported() to indicate that either the device cannot be quiesced, or quiesce() has not been implemented. RETURN VALUES DDI_SUCCESS For quiesce(), the device has been successfully quiesced. DDI_FAILURE The operation failed. CONTEXT This function is called from kernel context only. ATTRIBUTES See attributes(5) for descriptions of the following attri- butes: ____________________________________________________________ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | |_____________________________|_____________________________| | Interface Stability | | | quiesce(9E) | Committed | |ddi_quiesce_not_needed(9E) | Committed | |ddi_quiesce_not_supported(9F)| Sun Private | |_____________________________|_____________________________| SEE ALSO attach(9E), detach(9E), ddi_add_intr(9F), ddi_map_regs(9F), pci_config_setup(9F), timeout(9F), reboot(1M), uadmin(1M), uadmin(2), ddi_quiesce_not_needed(9F), ddi_quiesce_not_supported(9F), dev_ops(9S) NOTES When quiesce() is called, the system is single-threaded, therefore the driver's quiesce() implementation must not block. For example, the implementation must not create or tear down mappings, or call FMA functions, or create/cancel callbacks. A.2 Man page for reboot(1M) System Administration Commands reboot(1M) NAME reboot - restart the operating system SYNOPSIS /usr/sbin/reboot [-dlnq] [-f [dryrun] [-e BE]] [boot_arguments] | DESCRIPTION The reboot utility restarts the kernel. The kernel is loaded into memory by the PROM monitor, which transfers control to the loaded kernel. On x86 systems, when the -f flag is specified, the running | kernel will load the next kernel into memory, then transfers | control to the loaded kernel. | Although reboot can be run by the super-user at any time, shutdown(1M) is normally used first to warn all users logged in of the impending loss of service. See shutdown(1M) for details. The reboot utility performs a sync(1M) operation on the disks, and then a multi-user reboot is initiated. See init(1M) for details. On x86 systems, reboot may also update the boot archive as needed to ensure a successful reboot. The reboot utility normally logs the reboot to the system log daemon, syslogd(1M), and places a shutdown record in the login accounting file /var/adm/wtmpx. These actions are inhibited if the -n or -q options are present. Normally, the system reboots itself at power-up or after crashes. OPTIONS The following options are supported: -d Force a system crash dump before rebooting. See dumpadm(1M) for information on configuring system crash dumps. -e If -f is present, reboot to the specified boot | environment. | | | -f Fast reboot bypassing firmware and boot loader. The | new kernel will be loaded into memory by the running | kernel, and control will be transferred to the loaded | kernel. If disk or kernel arguments are specified, | they must be specified before other boot arguments. | | When -f is present, reboot(1M) accepts a special argument | "dryrun" to check whether all the drivers in the | system have implemented quiesce(9E). | | Currently only available on x86 system. | | See Example 3 for details. | -l Suppress sending a message to the system log daemon, syslogd(1M) about who executed reboot. -n Avoid calling sync(2) and do not log the reboot to syslogd(1M) or to /var/adm/wtmpx. The kernel still attempts to sync filesystems prior to reboot, except if the -d option is also present. If -d is used with -n, the kernel does not attempt to sync filesystems. -q Quick. Reboot quickly and ungracefully, without shut- ting down running processes first. OPERANDS The following operands are supported: boot_arguments An optional boot_arguments specifies argu- ments to the uadmin(2) function that are passed to the boot program and kernel upon restart. The form and list of arguments is described in the boot(1M) and kernel(1M) man pages.. If the arguments are speci- fied, whitespace between them is replaced by single spaces unless the whitespace is quoted for the shell. If the boot_arguments begin with a hyphen, they must be preceded by the -- delimiter (two hyphens) to denote the end of the reboot argument list. EXAMPLES Example 1 Passing the -r and -v Arguments to boot In the following example, the delimiter -- (two hyphens) must be used to separate the options of reboot from the arguments of boot(1M). example# reboot -dl -- -rv Example 2 Rebooting Using a Specific Disk and Kernel The following example reboots using a specific disk and ker- nel. example# reboot disk1 kernel.test/unix Example 3 Fast reboot | | Check if all the drivers on the system are fast reboot capable. | | example# reboot -f dryrun | | Rebooting to another UFS root disk. | | example# reboot -f -- '/dev/dsk/c1d0s0' | | Rebooting to another ZFS root pool. | | example# reboot -f -- 'rootpool/root1' | | Rebooting to "mykernel" on the same disk with "-k" option. | | example# reboot -f -- '/platform/i86pc/mykernel/amd64/unix -k' | | Rebooting to "mykernel" off another root disk mounted on /mnt. | | example# reboot -f -- '/mnt/platform/i86pc/mykernel/amd64/unix -k' | | Rebooting to "/platform/i86pc/kernel/$ISADIR/unix" on another boot | environment named "second_root". | | example# reboot -f -e second_root | | Rebooting to the same kernel with "-kv" options. | | example# reboot -f -- '-kv' | FILES /var/adm/wtmpx login accounting file ATTRIBUTES System Administration Commands reboot(1M) See attributes(5) for descriptions of the following attri- butes: ____________________________________________________________ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | |_____________________________|_____________________________| | Availability | SUNWcsu | |_____________________________|_____________________________| SEE ALSO mdb(1), boot(1M), dumpadm(1M), fsck(1M), halt(1M), init(1M), kernel(1M), shutdown(1M), sync(1M), syslogd(1M), sync(2), uadmin(2), reboot(3C), attributes(5) NOTES The reboot utility does not execute the scripts in /etc/rcnum.d or execute shutdown actions in inittab(4). To ensure a complete shutdown of system services, use shutdown(1M) or init(1M) to reboot a Solaris system. A.3 Man page for uadmin(2) System Calls uadmin(2) NAME uadmin - administrative control SYNOPSIS #include int uadmin(int cmd, int fcn, uintptr_t mdep); DESCRIPTION The uadmin() function provides control for basic administra- tive functions. This function is tightly coupled to the sys- tem administrative procedures and is not intended for gen- eral use. The argument mdep is provided for machine- dependent use and is not defined here. It should be initial- ized to NULL if not used. As specified by cmd, the following commands are available: A_SHUTDOWN The system is shut down. All user processes are killed, the buffer cache is flushed, and the root file system is unmounted. The action to be taken after the system has been shut down is specified by fcn. The functions are generic; the hardware capabilities vary on specific machines. AD_HALT Halt the processor(s). AD_POWEROFF Halt the processor(s) and turn off the power. AD_BOOT Reboot the system, using the kernel file. AD_IBOOT Interactive reboot; user is prompted for bootable program name. AD_FASTREBOOT Bypass BIOS and boot loader | | AD_FASTREBOOT_DRYRUN Fast reboot dry run to | check whether a system supports | fast reboot. | A_REBOOT The system stops immediately without any further processing. The action to be taken next is specified by fcn as above. A_DUMP The system is forced to panic immediately without any further processing and a crash dump is written to the dump device (see dumpadm(1M)). The action to be taken next is specified by fcn, as above. A_REMOUNT The root file system is mounted again after having been fixed. This should be used only during the startup process. A_FREEZE Suspend the whole system. The system state is preserved in the state file. The following subcommands, specified by fcn, are available. AD_SUSPEND_TO_DISK Save the system state to the state file. This subcom- mand is equivalent to ACPI state S4. AD_CHECK_SUSPEND_TO_DISK Check if your sys- tem supports suspend to disk. Without performing a system suspend/resume, this subcommand checks if this feature is currently avail- able on your sys- tem. AD_SUSPEND_TO_RAM Save the system state to memory This subcommand is equivalent to ACPI state S3. AD_CHECK_SUSPEND_TO_RAM Check if your sys- tem supports suspend to memory. Without performing a system suspend/resume, this subcommand checks if this feature is currently available on your system. The following subcommands, specified by fcn, are obsolete and might be removed in a subse- quent release: AD_COMPRESS Save the system state to the state file with compression of data. This subcommand has been replaced by AD_SUSPEND_TO_DISK, which should be used instead. AD_CHECK Check if your system supports suspend and resume. Without performing a system suspend/resume, this command checks if this feature is currently available on your system. This subcommand has been replaced by AD_CHECK_SUSPEND_TO_DISK, which should be used instead. AD_FORCE Force AD_COMPRESS even when threads of user applications are not suspendable. This sub- command should never be used, as it might result in undefined behavior. RETURN VALUES Upon successful completion, the value returned depends on cmd as follows: A_SHUTDOWN Never returns. A_REBOOT Never returns. A_FREEZE 0 upon resume. A_REMOUNT 0. Otherwise, -1 is returned and errno is set to indicate the error. ERRORS The uadmin() function will fail if: EBUSY Suspend is already in progress. EINVAL The cmd argument is invalid. ENOMEM Suspend/resume ran out of physical memory. ENOSPC Suspend/resume could not allocate enough space on the root file system to store system information. ENOTSUP Suspend/resume is not supported on this platform or the command specified by cmd is not allowed. ENXIO Unable to successfully suspend system. EPERM The {PRIV_SYS_CONFIG} privilege is not asserted in the effective set of the calling process. ATTRIBUTES See attributes(5) for descriptions of the following attri- butes: ____________________________________________________________ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | |_____________________________|_____________________________| | Interface Stability | See below. | |_____________________________|_____________________________| The A_FREEZE command and its subcommands are Committed. SEE ALSO dumpadm(1M), kernel(1M), uadmin(1M), attributes(5), privileges(5) A.4 Man page for dev_ops(9S) Data Structures for Drivers dev_ops(9S) NAME dev_ops - device operations structure SYNOPSIS #include #include INTERFACE LEVEL Solaris DDI specific (Solaris DDI). DESCRIPTION dev_ops contains driver common fields and pointers to the bus_ops and cb_ops(9S). Following are the device functions provided in the device operations structure. All fields must be set at compile time. devo_rev Driver build version. Set this to DEVO_REV. devo_refcnt Driver reference count. Set this to 0. devo_getinfo Get device driver information (see getinfo(9E)). devo_identify This entry point is obsolete. Set to nulldev. devo_probe Probe device. See probe(9E). devo_attach Attach driver to dev_info. See attach(9E). devo_detach Detach/prepare driver to unload. See detach(9E). devo_reset Reset device. (Not supported in this release.) Set this to nodev. devo_cb_ops Pointer to cb_ops(9S) structure for leaf drivers. devo_bus_ops Pointer to bus operations structure for nexus drivers. Set this to NULL if this is for a leaf driver. devo_power Power a device attached to system. See power(9E). devo_quiesce Quiesce a device attached to system. See | quiesce(9E). Can be set to | ddi_quiesce_not_needed(9F) if the driver does | not need to implement quiesce, or set to | ddi_quiesce_not_supported(9F) if the driver | cannot quiesce device to support fast-reboot. | STRUCTURE MEMBERS int devo_rev; int devo_refcnt; int (*devo_getinfo)(dev_info_t *dip, ddi_info_cmd_t infocmd, void *arg, void **result); int (*devo_identify)(dev_info_t *dip); int (*devo_probe)(dev_info_t *dip); int (*devo_attach)(dev_info_t *dip, ddi_attach_cmd_t cmd); int (*devo_detach)(dev_info_t *dip, ddi_detach_cmd_t cmd); int (*devo_reset)(dev_info_t *dip, ddi_reset_cmd_t cmd); struct cb_ops *devo_cb_ops; struct bus_ops *devo_bus_ops; int (*devo_power)(dev_info_t *dip, int component, int level); int (*devo_quiesce)(dev_info_t *dip); | SEE ALSO attach(9E), detach(9E), getinfo(9E), probe(9E), power(9E), | quiesce(9E), nodev(9F), ddi_quiesce_not_needed(9F), | ddi_quiesce_not_supported(9F) |