1. What specifically is the proposal that we are reviewing? An extension to Zones to provide the option that a zone's IP networking be completely separated from the IP networking in other zones, including the global zone. - What is the technical content of the project? There are six primary technical components: 1. Introducing a model where a non-global zone can either share the IP stack with the global zone (which is the only possibility in S10), or have an exclusive IP stack to itself. A zone with an exclusive IP stack will need exclusive access to one or more network interfaces (could be separate LANs like bge1, or separate VLANs like bge2000; any datalink interface with a separate name in /dev/ can be used.) In this model, by design there is no sharing e.g., a zone with an exclusive stack has its own routing table, arp table, IPsec security policy and security associations, IP Filter rules and state, etc. 2. Extensions to zonecfg syntax to - Allow specifying that a zone has an exclusive IP stack: 'set stacktype=exclusive' - Since with exclusive stacks we also get the ability to configure a non-global zone with DHCP and/or IPv6 stateless address autoconfiguration, there are zonecfg syntax additions for this. - Allow specifying static default routers for zones with exclusive stacks. - A new 'restrict' boolean property for the 'net' resource which is passed to the kernel in form of an exact match of the network interface name and the IP address. When set, this prevents a non-global zone that has an exclusive stack from "stealing" somebody else's IP address. 3. Splitting the PRIV_SYS_NET_CONFIG privilege by introducing the new PRIV_SYS_IP_CONFIG privilege, that allows a subset of the operations allowed by PRIV_SYS_NET_CONFIG. A zone with a global stack will be granted PRIV_SYS_IP_CONFIG and PRIV_SYS_NET_RAWACCES. 4. Changes to the networking SMF method scripts which today skip sections of the script if "zonename != global", to not skip those sections if the non-global zone has an exclusive stack. This test requires a way for a script to be able to ask "does this zone have an exclusive stack". The current approach is do to this with a new option for zonename(1m) - zonename -t. 5. Modifications to all the kernel modules that make up "IP" (such as /kernel/drv/ip and the IPsec modules), as well as all the kernel modules that perform function calls into IP (such as TCP and IP Filter), to allow multiple instances of all their writable global data. 6. A "netstack" kernel framework (netstack.h/netstack.c) which isolates the TCP/IP kernel modules from the mapping between zones and stacks, and provides things like kstat_create_netstack() analogous to kstat_create_zone(). For further details on this, see the design document. Note that the proposal targets strict separation when separate LANs or VLANs are used for different zones. The Crossbow project will deliver Virtual NIC and vSwitch support which can be combined with stack instances to allow multiple zones connected to the same LAN/VLAN to have a different IP routing table. This proposal does not propose any changes at or below the datalink layer. Any datalink device which has a separate /dev entry for each instance or VLAN can be used. - Is this a new product, or a change to a pre-existing one? If it is a change, would you consider it a "major", "minor", or "micro" change? Micro since no change of existing interfaces. - If your project is an evolution of a previous project, what changed from one version to another? This project evolves Zones. - What is the motivation for it, in general as well as specific terms? Enable customers that need to consolidate applications that today run on separate servers connected to separate LANs or VLANs, onto multiple zones on a single server. Customers already are under the impression that this should be possible since Sun has said that with Zones there is network isolation (when in fact all we do in S10 is to give a zone its own IP address ensuring isolation in the IP addressing name space). Many customers are operating with workarounds to try to make their datacenter consolidation work using the current zones networking model and bugs. 2. Describe how your project changes the user experience, upon installation and during normal operation. There are no changes to installation. When a zone is configured the user can choose to specify that it have an exclusive IP stack. If the user chooses to not do this, the zone will be configured with a shared IP stack and the user will see no differences compared to S10. For an exclusive stack, the user specifies the basic network configuration (IP addresses) in zonecfg as before, but with the additional option to specify dynamic address configuration (DHCP or IPv6) and static default routes. The user also has the option to restrict a zone with an exclusive stack so that it can only use the IP address the zone has been assigned in zonecfg. As with S10, higher level networking configuration (name services etc) are configured using the sysid tools whether the zone has a shared or exclusive stack. 3. What is its plan? Are there multiple delivery phases? We plan to integrate all six technical components as a single unit. The related Crossbow components may or may not be integrated at the same time, since there are no technical or architectural dependencies. - Has a design review been done? Not yet. - What is its current status? BFU archives that include the components we plan to integrate are currently being tested by the project team. 4. Are there related projects in Sun? As stated above, Crossbow will introduce VNICs and vSwitches, as well as network resource control, that can be used in conjunction with Stack Instances. Clearview Nemo Unification (PSARC/2005/132) will introduce separate names in /dev/ for each instance of the Cassini (and other legacy DLPI drivers), which will enable assigning different instances of ce to different stacks. The NWAM project aims to automate Solaris network configuration, which might entail moving network configuration from files in /etc (like /etc/hostname.) which will provide better management abstractions, whether or not exclusive IP stacks are used. 5. How is the project delivered into the system? Through existing packages. Since some of the functionality (IP Filter, IPsec, and base crypto support) are currently delivering in hollow packages, there needs to be some packaging changes to make those capabilities available in exclusive stack zones. The current plan is to change SUNWcnetr and SUNWipfr from hollow to non-hollow packages. An option is to deliver stack instances in an S10 update without support for IP Filter and IPsec in zones with exclusive stacks, if we lack the proper infrastructure to reliably change package attributes like "hollow" in an update. 6. Describe the project's hardware platform dependencies. None. 7. System administration - How will the project's deliverables be installed and (re)configured? Installed using the standard Solaris package utilities. Configured using zonecfg. - How will the project's deliverables be uninstalled? The deliverables are part of the base system and cannot be uninstalled. - Does it use inetd to start itself? No. - Does it need installation within any global system tables? No. - Does it use a naming service such as NIS, NIS+ or LDAP? Zonecfg uses gethostbyname/getaddrinfo in S10 to lookup host names to IP addresses. This does not change. - What are its on-going maintenance requirements (e.g. Keeping global tables up to date, trimming files)? None. - How does this project's administrative mechanisms fit into Sun's system administration strategies? E.g., how does it fit under the Solaris Management Console (SMC) and Web-Based Enterprise Management (WBEM), how does it make use of roles, authorizations and rights profiles? Additionally, how does it provide for administrative audit in support of the Solaris BSM configuration? SunMC and N1SPS can provision zones for Solaris 10 using zonecfg. Thus they can be extended to provision zones with exclusive stacks using the same interface. - What tunable parameters are exported? No new tunable parameters. A zone with an exclusive stack has access to its own set of ndd parameters for the TCP/IP stack, just like the global zone. 8. Reliability, Availability, Serviceability (RAS) - Does the project make any material improvement to RAS? No. - How can users/administrators diagnose failures or determine operational state? (For example, how could a user tell the difference between a failure and very slow performance?) From within each zone the normal network tools (such as netstat, kstat, snoop, traceroute, ping) can be used. From the global zone, in order to inspect the stack in other zone, one can invoke e.g. zlogin zoneA netstat This performs IP address to name translation from within zoneA, which is important if the different zones use non-unique IP addresses. - What are the project's effects on boot time requirements? None. - How does the project handle dynamic reconfiguration (DR) events? If DR is handled by the kernel (e.g., through the devices' detach(9E) and attach(9E) routines) things work as before. Need to investigate if DR requires the invocation of an RCM daemon to unconfigure network interfaces, since the global zone doesn't "see" the network interfaces used by other IP stacks. - What mechanisms are provided for continuous availability of service? N/A - Does the project call panic()? No. - How are significant administrative or error conditions transmitted? Using existing mechanisms e.g. errors detected by zonecfg and zoneadm are reported as such. Errors inside a zone will booting are handled using SMF. - How does the project deal with failure and recovery? For zones using the shared stack there are no changes. There are several techniques for handling link and network failures in Solaris: link aggregation, IP Multipathing, and OSPF Multipathing. Link aggregation can be configured in either the global zone or in the exclusive stack zone. In the former case, dladm is used in the global zone to create "aggr0", and then the zone is configured to use physical=aggr0. In the latter case the zone is configured to use physical=bge0 and physical=bge1, and then inside the zone the administrator uses dladm to create the aggregate, and then configure IP on that aggr0. IP Multipathing can be configured inside an exclusive stack zone the same way it is configured in the global zone. Same thing applies to OSPF-MP. As an aside, once we have vSwitch support one can configure OSPF-MP in a different way with the global zone being a router for the non-global zones. - Does it ever require reboot? If so, explain why this situation cannot be avoided. No. - How does your project deal with network failures (including partition and re-integration)? How do you handle the failure of hardware that your project depends on? See failure/recovery question above. - Can it save/restore or checkpoint and recover? Same as zonecfg today. - Can its files be corrupted by failures? Does it clean up any locks/files after crashes? Same as zonecfg today. 9. Observability - Does the project export status, either via observable output (e.g., netstat) or via internal data structures (kstats)? Yes. An exclusive stack zone has its own netstat and kstat statistics and tables for the TCP/IP stack. - How would a user or administrator tell that this subsystem is or is not behaving as anticipated? Traditional network interface tools (netstat, ping, traceroute, snoop, etc.) can be used inside an exclusive stack zone. - What statistics does the subsystem export, and by what mechanism? Same as the TCP/IP stack in the global zone. - What state information is logged? Same as the TCP/IP stack in the global zone. - In principle, would it be possible for a program to tune the activity of your project? No programmatic facilities are provided for tuning. ndd is available for each exclusive stack zone, but we do not recommend using it any more than we do for the global zone. 10. What are the security implications of this project? - What security issues do you address in your project? Privileges that allow a non-global zone to configure its IP stack. Privileges and /dev management in the non-global zone that allows raw access to the network. - The Solaris BSM configuration carries a Common Criteria (CC) Controlled Access Protection Profile (CAPP) -- Orange Book C2 -- and a Role Based Access Control Protection Profile (RBAC) -- rating, does the addition of your project effect this rating? No. - Is system or subsystem security compromised in any way if your project's configuration files are corrupt or missing? No. - Please justify the introduction of any (all) new setuid executables. None. - Include a thorough description of the security assumptions, capabilities and any potential risks (possible attack points) being introduced by your project. A separate Security Questionnaire http://sac.sfbay/cgi-bin/bp.cgi?NAME=Security.bp is provided for more detailed guidance on the necessary information. Cases are encouraged to fill out and include the Security questionnaire (leveraging references to existing documentation) in the case materials. Projects must highlight information for the following important areas: - What features are newly visible on the network and how are they protected from exploitation (e.g. unauthorized access, eavesdropping) - If the project makes decisions about which users, hosts, services, ... are allowed to access resources it manages, how is the requestor's identity determined and what data is used to determine if the access granted. Also how this data is protected from tampering. - What privileges beyond what a common user (e.g. 'noaccess') can perform does this project require and why those are necessary. - What parts of the project are active upon default install and how it can be turned off. 11. What is its UNIX operational environment: - Which Solaris release(s) does it run on? Part of Solaris. - Environment variables? Exit status? Signals issued? Signals caught? (See signal(3HEAD).) N/A -- no changes to existing behavior. - Device drivers directly used (e.g. /dev/audio)? .rc/defaults or other resource/configuration files or databases? None - Does it use any "hidden" (filename begins with ".") or temp files? No. - Does it use any locking files? No. - Command line or calling syntax: Additions to the zonecfg syntax. - Is there support for standard forms, e.g. "-display" for X programs? Are these propagated to sub-environments? N/A - What shared libraries does it use? (Hint: if you have code use "ldd" and "dump -Lv")? In addition to its existing dependencies, zonecfg/zoneadm/zoneadmd might now depend on libsocket and libnsl to be able to fully verify the configuration. - Identify and justify the requirement for any static libraries. N/A - Does it depend on kernel features not provided in your packages and not in the default kernel (e.g. Berkeley compatibility package, /usr/ccs, /usr/ucblib, optional kernel loadable modules)? No. - Is your project 64-bit clean/ready? If not, are there any architectural reasons why it would not work in a 64-bit environment? Does it interoperate with 64-bit versions? Yes. - Does the project depend on particular versions of supporting software (especially Java virtual machines)? If so, do you deliver a private copy? What happens if a conflicting or incompatible version is already or subsequently installed on the system? N/A. - Is the project internationalized and localized? Yes. - Is the project compatible with IPV6 interfaces and addresses? Yes. 12. What is its window/desktop operational environment? N/A -- no graphical components are provided by this project. 13. What interfaces does your project import and export? - Please provide a table of imported and exported interfaces, including stability levels. Pay close attention to the classification of these interfaces in the Interface Taxonomy -- e.g., "Standard," "Stable," and "Evolving;" see: http://sac.sfbay/cgi-bin/bp.cgi?NAME=interface_taxonomy.bp Interfaces Imported Interface Classification Comments zone_key_create ??? Interfaces Exported Interface Classification Comments zonecfg extensions Evolving The existing zonecfg syntax stacktype property is Evolving. router resource af property restrict property privileges(5) Added PRIV_SYS_IP_CONFIG Stable The privilege names in [2] are all stable. Internal Interfaces Interface Classification Comments Extensions to zone xml Project Private zone_create flags Project Private To tell kernel whether excl or shared. zone_add_ifname() Project Private To implement restrict=yes zone_remove_ifname() Project Private zone_ifname_lookup() Project Private netstack_register() Project Private Akin to zone_key_create() netstack_unregister() Project Private netstack_get_current() Project Private For xx_open lookups etc/ netstack_find_by_cred() Project Private netstack_find_by_stackid()Project Private netstack_hold() Project Private netstack_rele() Project Private netstackid_to_zoneid() Project Private zoneid_to_netstackid() Project Private kstat_create_netstack() Project Private For kstats made visible for kstat_destoy_netstack() Project Private one netstack. netstack_handle_t Project Private For modules that need to netstack_next_init() Project Private walk all netstacks. netstack_next_fini() Project Private netstack_next() Project Private secpolicy_ip_config() Consolidation Private - Protocols (public or private) None. - Exported public library APIs and ABIs Drag and Drop ToolTalk Cut/Paste N/A - What other applications should it interoperate with? How will it do so? No changes from existing zonecfg/zoneadm - Is it "pipeable"? How does it use stdin, stdout, stderr? No changes from existing zonecfg/zoneadm - Explain the significant file formats, names, syntax, and semantics. As part of extending the zonecfg syntax, the corresponding extensions are made to the xml syntax for the zones description. - Is there a public namespace? No. - Are the externally visible interfaces documented clearly enough for a non-Sun client to use them successfully? Yes. 14. What are its other significant internal interfaces inter-subsystem and inter-invocation)? - Files - Other - Are the interfaces re-entrant? Yes. 15. Is the interface extensible? How will the interface evolve? - How is versioning handled? Same as existing zonecfg and xml syntax. - What was the commitment level of the previous version? Project private - Can this version co-exist with existing standards and with earlier and later versions or with alternative implementations (perhaps by other vendors)? - What are the clients over which a change should be managed? Currently, aside from the zonecfg syntax, all key interfaces are private and thus can be revised in lockstep if changes are necessary. - How is transition to a new version to be accomplished? What are the consequences to ISV's and their customers? Just as this project can extend the zonecfg syntax in a compatible way, we envision that other such extensions can be done without any incompatibilities. 16. How do the interfaces adapt to a changing world? Using existing versioning mechanisms where they exist. The project private and consolidation private interfaces will be managed without explicit versioning. 17. Interoperability - If applicable, explain your project's interoperability with the other major implementations in the industry. In particular, does it interoperate with Microsoft's implementation, if one exists? N/A - What would be different about installing your project in a heterogeneous site instead of a homogeneous one (such as Sun)? Nothing. - Does your project assume that a Solaris-based system must be in control of the primary administrative node? The changes are an integral part of Solaris. 18. Performance - How will the project contribute (positively or negatively) to "system load" and "perceived performance"? A zone with an exclusive stack will result in some additional kernel memory allocation compared to a zone with a shared stack. This is due to there being separate hash tables for tcp, udp, ip, etc per stack. Currently there is no reliable technology to measure the difference between the amount of kernel memory used for a shared stack zone and an exclusive stack zone. When multiple exclusive stacks are used, the complete separation will result in better cache locality. - What are the performance goals of the project? How were they evaluated? What is the test or reference platform? There were no specific performance goals, but network performance and system utilization must not be measurably worse as a result of this project. - Does the application pause for significant amounts of time? Can the user interact with the application while it is performing long-duration tasks? N/A - What is your project's MT model? How does it use threads internally? How does it expect its client to use threads? If it uses callbacks, can the called entity create a thread and recursively call back? The MT model of the kernel as well as zonecfg/zoneadm/zoneadmd is unchanged. - What is the impact on overall system performance? What is the average working set of this component? How much of this is shared/sharable by other apps? No impact. - Does this application "wake up" periodically? How often and under what conditions? What is the working set associated with this behavior? N/A - Will it require large files/databases (for example, new fonts)? No. - Do files, databases or heap space tend to grow with time/load? What mechanisms does the user have to use to control this? What happens to performance/system load? N/A 19. Please identify any issues that you would like the ARC to address. - Are there issues or related projects that the ARC should advise the appropriate steering committees? 20. Appendices to include [1] PSARC 2002/174 Virtualization and Namespace Isolation in Solaris [2] PSARC 2002/188 Least privilege for Solaris