sun microsystems Systems Architecture Committee _________________________________________________________________ Subject: BrandZ Support for non-native zones Submitted by: Nils Niewejaar File: PSARC/2005/471/opinion.ms Date: July 28th, 2006 Committee: Ed Gould (Opinion by Alan Hargreaves), James D Carlson, Glenn Skinner, William Sommerfeld, Gary Winiger. Product Approval Committee: solaris-pac-opinion@sun.com 1. Summary BrandZ is an extension of the zones infrastructure that allows the creation of zones that emulate non-native operat- ing system environments, such as Linux. Future projects may extend this project to build other non-native operating environments. 2. Decision & Precedence Information The project is approved as specified in reference [1]. The project may be delivered in a patch release of Solaris. The project supercedes PSARC/2003/445: Janus: Linux binary compatibility for Solaris x86. The project depends on PSARC/2006/440: BrandZ-aware Installer and may not be delivered before it. 3. Interfaces The project exports the following interfaces. ______________________________________________________________________________ | Interfaces Exported | |______________________________|_________________|___________________________| |Interface | Classification | Comments | |______________________________|_________________|___________________________| | | | | |______________________________|_________________|___________________________| PSARC/2005/471 Copyright 2006 Sun Microsystems - 2 - ______________________________________________________________________________ | Interfaces Exported | |______________________________|_________________|___________________________| |Interface | Classification | Comments | |______________________________|_________________|___________________________| |Linux Interfaces: | External | This category includes| | | | all of the different| |System calls (structure,| | Linux interfaces that the| |semantics and calling conven-| | lx brand emulates. | |tions) | | | | | | | |/dev (names and major/minor| | | |#'s) | | | |/proc | | | |signal numbers | | | |error numbers | | | |______________________________|_________________|___________________________| |AT_SUN_BRAND_BASE | Project Private| Additional AUX vector| |AT_SUN_BRAND_LDDATA | | flags used to convey| |AT_SUN_BRAND_LDENTRY | | brand information to the| |AT_SUN_BRAND_BRANDNAME | | Solaris linker | |AT_SUN_BRAND_PHDR | | | |AT_SUN_BRAND_PHENT | | | |AT_SUN_BRAND_PHNUM | | | |AT_SUN_BRAND_ENTRY | | | |______________________________|_________________|___________________________| |config.xml | Project Private| Brand definition | |______________________________|_________________|___________________________| |platform.xml | Project Private| Virtual platform defini-| | | | tion | |______________________________|_________________|___________________________| |struct modlbrand | Consolidation | kernel/brand module link-| | | Private | age interface | |______________________________|_________________|___________________________| |struct brand | Project Private| Kernel/brand operational| | | | interface | |______________________________|_________________|___________________________| |struct brand_ops | Project Private| Kernel/brand operational| | | | interface | |______________________________|_________________|___________________________| |struct brand_mach_ops | Project Private| Arch-specific | | | | kernel/brand operational| | | | interface | |______________________________|_________________|___________________________| |struct brand_attr | Project Private| Userspace/kernel inter-| | | | face | |______________________________|_________________|___________________________| |struct lx_brand_registration | Project Private| Userspace/kernel inter-| | | | face | |______________________________|_________________|___________________________| |rd_helper_ops_t | Consolidation | librtld_db.so helper plu-| | | Private | gin interface | |______________________________|_________________|___________________________| | | | | |______________________________|_________________|___________________________| PSARC/2005/471 Copyright 2006 Sun Microsystems - 3 - ______________________________________________________________________________ | Interfaces Exported | |______________________________|_________________|___________________________| |Interface | Classification | Comments | |______________________________|_________________|___________________________| |brand_open() brand_close() | Project Private| libbrand.so.1 is a new| |brand_is_native() | | library for parsing the| |brand_get_boot() | | BrandZ .xml files | |brand_get_halt() | | | |brand_get_initname() | | | |brand_get_install() | | | |brand_get_modename() | | | |brand_get_postclone() | | | |brand_get_verify() | | | |brand_platform_iter_gmounts() | | | |brand_platform_iter_lmounts() | | | |brand_platform_iter_devdir() | | | |brand_platform_iter_link() | | | |______________________________|_________________|___________________________| |zonecfg_get_brand() | Contracted Pro-| Added to libzonecfg.so.1 | |zone_get_brand() | ject Private | Contract in reference [4]| |______________________________|_________________|___________________________| |zonecfg(1M) | Evolving | Added -B option | |______________________________|_________________|___________________________| |zoneadm(1M) | Project Private| Added -f (force) option| | | | to mount and boot com-| | | | mands | | | | Added "brand" column to| | | | verbose "list" output | |______________________________|_________________|___________________________| |zonecfg(1M) | Evolving | Added -B option | |______________________________|_________________|___________________________| |lockd(1M) statd(1M) | Consolidation | Added -P option to indi-| | | Private | cate portportmapper usage| |______________________________|_________________|___________________________| |libnsl(3LIB) | Consolidation | Add __use_portmapper() to| | | Private | resurrect old portmapper| | | | support | |______________________________|_________________|___________________________| |streamio(7I) | Evolving | Add support for| | | | TIOCSCTTY, TIOCNOTTY,| | | | TIOCSETLD and TOICGETLD | |______________________________|_________________|___________________________| |uucopy(2) | Evolving | Added to libc.so.1 See | | | | design doc: 3.5.2 | |______________________________|_________________|___________________________| |set_setcontext_enforcement(3C)| Consolidation | Added to libc.so.1 See | | | Private | design doc 3.6.2 | |______________________________|_________________|___________________________| |setsigacthandler(3C) | Consolidation | Added to libc.so.1 See | | | Private | design doc 3.6.1 | |______________________________|_________________|___________________________| | | | | | | | | |______________________________|_________________|___________________________| PSARC/2005/471 Copyright 2006 Sun Microsystems - 4 - ______________________________________________________________________________ | Interfaces Exported | |______________________________|_________________|___________________________| |Interface | Classification | Comments | |______________________________|_________________|___________________________| |lx-install(1M) | Evolving | Invoked by zoneadm(1M),| | | | but options are user-| | | | visible | |______________________________|_________________|___________________________| |lx-syscall(7D) | Evolving | Linux syscall provider | |______________________________|_________________|___________________________| |lx_ptm(7D) | Project Private| Linux pty master driver | |______________________________|_________________|___________________________| |ldlinux(7M) | Project Private| STREAMS module that pro-| | | | vides Linux termio(7I)| | | | semantics | |______________________________|_________________|___________________________| |lx_afs(7D) | Project Private| Linux automounter support| |______________________________|_________________|___________________________| |lx_audio(7D) | Project Private| Layered driver to convert| | | | Linux semantics to| | | | Solaris | |______________________________|_________________|___________________________| |______________________________|_________________|___________________________| The project imports the following interfaces. ____________________________________________________________________________ | Interfaces Imported | |_______________________|________________|_________________________________| |Interface | Classification| Comments | |_______________________|________________|_________________________________| |Linux syscall Interface| External | | |_______________________|________________|_________________________________| |rpm2cpio(1M) CLI rpm | External | Used to install RedHat software| |CLI | | | |_______________________|________________|_________________________________| |Linux statd(1M) and| External | Used to support NFS locking| |lockd(1M) uid/gid #'s | | within lx branded zones | |_______________________|________________|_________________________________| |glibc ABI | External | Used to provide naming services| | gethostbyname_r | | to Solaris statd(1M) and| | gethostbyaddr_r | | lockd(1M) daemons. See section| | getservbyname_r | | 3.8 of the design doc. | | getservbyport_r | | | | openlog | | | | syslog | | | | closelog | | | | __progname | | | |_______________________|________________|_________________________________| | | | | | | | | | | | | |_______________________|________________|_________________________________| PSARC/2005/471 Copyright 2006 Sun Microsystems - 5 - ____________________________________________________________________________ | Interfaces Imported | |_______________________|________________|_________________________________| |Interface | Classification| Comments | |_______________________|________________|_________________________________| |RHEL 3.x contents | External | /etc files, rc.d scripts, etc.| | | | which we modify at install| | | | time. | |_______________________|________________|_________________________________| |Linux ELF format | External | Object file format for Linux| | | | binaries | |_______________________|________________|_________________________________| 4. Opinion 4.1. The lx Brand The word Linux is a trademark. To avoid issues, the name lx is used to reference linux branded zones. PSARC had concerns about the management of this namespace for future brands and releases of linux. As a result, refer- ences to specific releases of the linux kernel were removed from the documentation and the lx brand will not be associ- ated with specific releases of a linux kernel. 4.2. Executable Stacks For compatibility BrandZ has to allow Linux applications to run with executable stacks, so those applications are vulnerable to any security holes that are opened by those stacks. However, since it is running inside a zone, any dam- age would be confined to that BrandZ instance. A compromised zone will not be able to bring down the system, and will neither have access to, nor be able to damage applications or data in other zones. Considered more generally, a BrandZ-hosted linux environment will be subject to any security holes in the Linux user- space. However, it will not be vulnerable to any security holes that depend on kernel support or kernel bugs. This would arguably make a BrandZ-hosted RHEL 3 environment more secure than a native RHEL 3 environment. 4.3. Truss, Apptrace and Dbx Truss has been updated to recognize the new Solaris system calls; it has not been, and will not be, updated to under- stand and display the Linux system calls issued by the application. An lx-syscall DTrace provider makes that PSARC/2005/471 Copyright 2006 Sun Microsystems - 6 - information available. dbx does not currently work, but this appears to be a bug rather than a fundamental limitation of the design. This is still under investigation and is being tracked as: 6445248 dbx cannot grok Linux processes 4.4. Live Upgrade and Packaging Tools Live upgrade doesn't run with zones. The packaging tools will go into the install gate 63242179 packaging tools need to be brand aware has been filed and links to this case. PSARC/2006/440 has been submitted and approved for working with live upgrade. 4.5. Audio There is no notion of a device-specific attribute, which is needed to support systems with multiple audio devices, in the zone's infrastructure now. Adding such a capability would have required an extensive overhaul of how devices are configured and managed. Rather than redesign the core of the zones configuration tools simply to solve one Linux corner case, the project team chose to use the generic attributes mechanism to sup- port audio devices. 4.6. Solaris Trusted Extensions After discussions between the project teams for this case and for the Trusted Extensions, it was determined that lx branded zones will not be supported on trusted systems where labels are active. 4.7. lxrun PSARC/2006/441 has been submitted and approved to EOL lxrun. 4.8. Process Auditing Processes running in an lx-branded zone do not have their Linux system calls audited. Otherwise, they are subject to all the standard auditing. For example, Linux process PSARC/2005/471 Copyright 2006 Sun Microsystems - 7 - creation/exit events are captured as for any other process. The Solaris system calls that the brand library uses to emu- late the Linux system calls are subject to auditing. The only restriction is that the Solaris audit processing tools cannot run inside the Linux zone, so the audit records must be consumed by tools running in the global zone. 4.9. Signals to init During inception, PSARC expressed a concern about how the lx init would deal with system generated signals that it was not expecting. The project team has addressed these concerns as follows. With standard Solaris zones, the kernel and init are in agreement on how to handle the death of init: the kernel restarts the process, and the resurrected init process uses a state file to pick up where its predecessor left off. The Linux init is not prepared to handle this kind of res- tart. When it is restarted, it works its way through the entire boot process again. This means that all the rc.d scripts are rerun, and we end up with multiple instances of services like crond, syslogd, and so on. Since it cannot simply ignore SIGSEGV, and since the Linux init is not prepared to handle a warm restart, the only action that will deliver a sensible result is to reboot the zone. Regardless of whether this is the expected behavior on a native Linux system, it's the behavior that will be implemented inside a Linux zone. 4.10. Delegated Administration of Solaris-specific capabil- ities Linux-branded zones will always be second-class citizens in many ways. As our real goal is to increase Solaris adop- tion, using BrandZ as one part of a migration strategy, we view this as a feature rather than a bug. To address these specific issues: ZFS delegation will not work within a Linux zone. Given sufficient customer interest, we could possibly support the ZFS utilities, but it would take a significant amount of engineering work, and would violate our "one binary type per zone" model. It should be noted that this in no way affects being able to install and run a Linux zone on a ZFS filesystem. Supporting network delegation is significantly more feasi- ble. By emulating the ioctl()s needed to perform network configuration tasks, we should be able to support network PSARC/2005/471 Copyright 2006 Sun Microsystems - 8 - delegation using Linux configuration tools. This would not be a trivial engineering effort, but it would certainly fit within the overall BrandZ model. 4.11. Impact on Zones Upgrade The zones test suite, which is run as a regular part of the PIT suite, will be extended to include testing of lx-branded zones. 5. Minority Opinion(s) None. 6. Advisory Information None. 7. Appendices 7.1. Appendix A: Technical Changes Required None. 7.2. Appendix B: Technical Changes Advised None. 7.3. Appendix C: Reference Material Unless stated otherwise, path names are relative to the case directory PSARC/2005/471. 1. Specification File: final.materials/design.pdf File: committment.materials/onepager File: committment.materials/what_works 2. 20 Questions File: final.materials/20_questions 3. Man Pages File: committment.materials/brand.dtd.1 File: committment.materials/brands.5 File: committment.materials/design.pdf File: committment.materials/lx.5 File: committment.materials/zone_platform.dtd.1 File: committment.materials/zoneadm.1m File: committment.materials/zonecfg.1m File: committment.materials/zones.5 PSARC/2005/471 Copyright 2006 Sun Microsystems - 9 - 4. Contract between Solaris Core Technologies and Solaris Install File: contract-01 PSARC/2005/471 Copyright 2006 Sun Microsystems