1. What specifically is the proposal that we are reviewing? An extension to Zones to provide the option that a zone's IP networking be completely separated from the IP networking in other zones, including the global zone. Note that the name has changed from "stack instances" to "IP instances" to help make it clear that other parts of the "stack", like NFS and kssl, is not made zones aware as part of this project. - What is the technical content of the project? There are seven primary technical components: 1. Introducing a model where a non-global zone can either share the IP state with the global zone (which is the only possibility in S10), or have an exclusive IP instance to itself. A zone with an exclusive IP instance will need exclusive access to one or more network interfaces (could be separate LANs like bge1, or separate VLANs like bge2000; any datalink interface with a separate name in /dev/ can be used.) In this model, by design there is no IP-level sharing e.g., a zone with an exclusive IP has its own routing table, arp table, IPsec security policy and security associations, IP Filter rules and state, etc. 2. Extensions to zonecfg syntax to - Allow specifying that a zone has an exclusive IP: 'set ip-type=exclusive' - Different syntax checking for exclusive-IP zones; the address property must not be set in the net resource. Extensions to zoneadm list options and output format. 3. Splitting the PRIV_SYS_NET_CONFIG privilege by introducing the new PRIV_SYS_IP_CONFIG privilege, that allows a subset of the operations allowed by PRIV_SYS_NET_CONFIG. A zone with an exclusive IP will be granted PRIV_SYS_IP_CONFIG and PRIV_SYS_NET_RAWACCES. 4. Changes to the networking SMF method scripts which today skip sections of the script if "zonename != global", to not skip those sections if the non-global zone has an exclusive IP. 5. Modifications to all the kernel modules that make up "IP" (such as /kernel/drv/ip and the IPsec modules), as well as all the kernel modules that perform function calls into IP (such as TCP and IP Filter), to allow multiple instances of all their writable global data. 6. A "netstack" kernel framework (netstack.h/netstack.c) which isolates the TCP/IP kernel modules from the mapping between zones and IP instances, and provides things like kstat_create_netstack() analogous to kstat_create_zone(). For further details on this, see the interface document. 7. Virtualize the pfhooks framework and facilities to support a IP instances model. The pfhooks modules neti and hook have the idea of separate IP instances. Via having the pfhooks interfaces have one extra parameter of zoneid_t, and following, using one generic handle, we can support IP instances hooks families and hooks facilities to support instance specific IPFilter (or 3rd parties) firewalls. Note that the proposal targets strict separation when separate LANs or VLANs are used for different zones. The Crossbow project might in the future deliver Virtual NIC and vSwitch support which can be combined with IP Instances to allow multiple zones connected to the same LAN/VLAN to have a different IP routing table. But that is type of sharing is not what this project delivers, even though the IP instances architecture enables that type of evolution. Any datalink device which has a separate /dev entry for each instance or VLAN can be used. For separation with separate LANs any DLPI style-1 driver will work; these are drivers which provide a separate /dev/ entry for each instance (e.g., /dev/bge0 and /dev/bge1). For VLAN separation, the project will add code to GLDv3 to make the implicit VLAN ppas appear in /dev/. For instance, if bge33000 is used, then /dev/bge33000 will be dynamically created so that zoneadm can give that device to an exlusive-IP zone. - Is this a new product, or a change to a pre-existing one? If it is a change, would you consider it a "major", "minor", or "micro" change? micro/patch - If your project is an evolution of a previous project, what changed from one version to another? This project evolves Zones. - What is the motivation for it, in general as well as specific terms? Enable customers that need to consolidate applications that today run on separate servers connected to separate LANs or VLANs, onto multiple zones on a single server. Customers already are under the impression that this should be possible since Sun has said that with Zones there is network isolation (when in fact all we do in S10 is to give a zone its own IP address ensuring isolation in the IP addressing name space). Many customers are operating with workarounds to try to make their datacenter consolidation work using the current zones networking model and bugs. 2. Describe how your project changes the user experience, upon installation and during normal operation. There are no changes to installation. When a zone is configured the user can choose to specify that it should have an exclusive IP instance. If the user chooses to not do this, the zone will be configured with a shared IP instance and the user will see no differences compared to S10. For an exclusive-IP zone, the user specifies the network boundary of the zone in zonecfg as before. But the boundary is different. For an exclusive-IP zone, the boundary is zero or more datalink names. For a shared-IP zone, the boundary is zero or more IP addresses, as before. For an exclusive-IP zone the IP-level network configuration (DHCP or static, IP addresses, default routers) is set up using the existing sysidtool. As with S10, higher level networking configuration (name services etc) are configured using sysidtool whether the zone has a shared or exclusive IP. As in S10, the sysidtool configuration can be pre-specified e.g., by putting a /etc/sysidcfg file in the zones root path before the first boot of the zone. 3. What is its plan? Are there multiple delivery phases? We plan to integrate all seven technical components as a single unit. The related Crossbow components may or may not be integrated at the same time, since there are no technical or architectural dependencies. Planned integration into Nevada in December, followed by integration into S10U4. - Has a design review been done? Yes. On network-discuss@opensolaris.org, and separate review with zones-core. - What is its current status? BFU archives that include the components we plan to integrate are currently being tested by the project team and QE. 4. Are there related projects in Sun? As stated above, Crossbow will introduce VNICs and vSwitches, as well as network resource control, that can be used in conjunction with IP Instances. Clearview Nemo Unification (PSARC/2005/132) will introduce separate names in /dev/ for each instance of the Cassini (and other legacy DLPI drivers), which will enable assigning different instances of ce to different exclusive-IP zones. The NWAM project aims to automate Solaris network configuration, which might entail moving network configuration from files in /etc (like /etc/hostname.) which will provide better management abstractions, whether or not exclusive-IP zones are used. There are implementation differences whether or not BrandZ or pfhooks is available, but there is no dependency. IP Instances can integrate into S10U4 whether or not those project integrate into S10U4. If that happens the S10U4 integration would exclude the changes to the project private interfaces introduced by [3] and [4]. Assuming libdlpi integrates prior to this project, we will make use of libdlpi, in order to avoid code duplication. 5. How is the project delivered into the system? Through existing packages. Since some of the functionality (IP Filter, IPsec, and base crypto support) are currently delivering in hollow packages, there needs to be some packaging changes to make those capabilities available in exclusive-IP zones. The current plan is to change SUNWcnetr from a hollow to non-hollow package, and to split out the config and smf parts of SUNWipfr (which also contains the kernel components) into a new package. An option is to deliver IP Instances in an S10 update without support for IP Filter and IPsec in zones with exclusive IPs, if we lack the proper infrastructure to reliably change package attributes like "hollow" in an update. (Doing so is clearly undesirable, but it would satisfy the project goals for providing IP isolation when different zones are connected to different LANs or different VLANs.) 6. Describe the project's hardware platform dependencies. None. 7. System administration - How will the project's deliverables be installed and (re)configured? Installed using the standard Solaris package utilities. Configured using zonecfg. - How will the project's deliverables be uninstalled? The deliverables are part of the base system and cannot be uninstalled. - Does it use inetd to start itself? No. - Does it need installation within any global system tables? No. - Does it use a naming service such as NIS, NIS+ or LDAP? Zonecfg uses gethostbyname/getaddrinfo in S10 to lookup host names to IP addresses. This does not change. - What are its on-going maintenance requirements (e.g. Keeping global tables up to date, trimming files)? None. - How does this project's administrative mechanisms fit into Sun's system administration strategies? E.g., how does it fit under the Solaris Management Console (SMC) and Web-Based Enterprise Management (WBEM), how does it make use of roles, authorizations and rights profiles? Additionally, how does it provide for administrative audit in support of the Solaris BSM configuration? SunMC and N1SPS can provision zones for Solaris 10 using zonecfg. Thus they can be extended to provision zones with exclusive IPs using the same interface. No changes to audit are necessary. - What tunable parameters are exported? No new tunable parameters. A zone with an exclusive IP has access to its own set of ndd parameters for the TCP/IP stack, just like the global zone. 8. Reliability, Availability, Serviceability (RAS) - Does the project make any material improvement to RAS? No. - How can users/administrators diagnose failures or determine operational state? (For example, how could a user tell the difference between a failure and very slow performance?) From within each zone the normal network tools (such as netstat, kstat, snoop, traceroute, ping) can be used. Just like in S10, from the global zone, in order to inspect TCP/IP networking in other zone, the global zone admin can invoke e.g. zlogin zoneA netstat since the netstat utility is not zone-aware (not even for shared-IP zones.) This performs IP address to name translation from within zoneA, which is important if the different zones use non-unique IP addresses. For shared-IP zones, ifconfig -a in the global zone will report all the IP addresses assigned to the shared IP instance. The project introduces 'zoneadm list -l' which shows all the datalink names that are used by the exlusive-IP zones. - What are the project's effects on boot time requirements? None. - How does the project handle dynamic reconfiguration (DR) events? If DR is handled by the kernel (e.g., through the devices' detach(9E) and attach(9E) routines) things work as before. Possible steps for NIC DR are outlined in section 15 in si-interfaces.pdf. FWIW Currently none of this is documented in the S10 zones documentation. - What mechanisms are provided for continuous availability of service? N/A - Does the project call panic()? No. - How are significant administrative or error conditions transmitted? Using existing mechanisms e.g. errors detected by zonecfg and zoneadm are reported as such. Networking-related errors inside a zone while booting are handled using SMF just as for the global zone. - How does the project deal with failure and recovery? For zones using the shared IP there are no changes. There are several techniques for handling link and network failures in Solaris: link aggregation, IP Multipathing, and OSPF Multipathing. Link aggregation can be configured in the global zone (e.g., dladm is used in the global zone to create "aggr0"), and then the zone is configured to use the aggreate datalink name (e.g., with physical=aggr0). IP Multipathing can be configured inside an exclusive-IP zone the same way it is configured in the global zone. Same thing applies to OSPF-MP. As an aside, once we have vSwitch support one can configure OSPF-MP in a different way with the global zone being a router for the non-global zones. - Does it ever require reboot? If so, explain why this situation cannot be avoided. No. - How does your project deal with network failures (including partition and re-integration)? How do you handle the failure of hardware that your project depends on? See failure/recovery question above. - Can it save/restore or checkpoint and recover? Same as zonecfg today. - Can its files be corrupted by failures? Does it clean up any locks/files after crashes? Same as zonecfg today. 9. Observability - Does the project export status, either via observable output (e.g., netstat) or via internal data structures (kstats)? Yes. An exclusive-IP zone has its own netstat and kstat statistics and tables for the TCP/IP stack. - How would a user or administrator tell that this subsystem is or is not behaving as anticipated? Traditional network interface tools (netstat, ping, traceroute, snoop, etc.) can be used inside an exclusive-IP zone. - What statistics does the subsystem export, and by what mechanism? Same as the TCP/IP stack in the global zone, except for the statistics related to the squeues which are common to all the zones since the squeues are shared across the zones. - What state information is logged? Same as the TCP/IP stack in the global zone. - In principle, would it be possible for a program to tune the activity of your project? No programmatic facilities are provided for tuning. ndd is available for each exclusive-IP zone, but we do not recommend using it any more than we do for the global zone. 10. What are the security implications of this project? - What security issues do you address in your project? Privileges that allow a non-global zone to configure its IP instance. Privileges and /dev management in the non-global zone that allows raw access to the network. - The Solaris BSM configuration carries a Common Criteria (CC) Controlled Access Protection Profile (CAPP) -- Orange Book C2 -- and a Role Based Access Control Protection Profile (RBAC) -- rating, does the addition of your project effect this rating? No. - Is system or subsystem security compromised in any way if your project's configuration files are corrupt or missing? No. - Please justify the introduction of any (all) new setuid executables. None. - Include a thorough description of the security assumptions, capabilities and any potential risks (possible attack points) being introduced by your project. A separate Security Questionnaire http://sac.sfbay/cgi-bin/bp.cgi?NAME=Security.bp is provided for more detailed guidance on the necessary information. Cases are encouraged to fill out and include the Security questionnaire (leveraging references to existing documentation) in the case materials. Projects must highlight information for the following important areas: - What features are newly visible on the network and how are they protected from exploitation (e.g. unauthorized access, eavesdropping) - If the project makes decisions about which users, hosts, services, ... are allowed to access resources it manages, how is the requestor's identity determined and what data is used to determine if the access granted. Also how this data is protected from tampering. - What privileges beyond what a common user (e.g. 'noaccess') can perform does this project require and why those are necessary. - What parts of the project are active upon default install and how it can be turned off. 11. What is its UNIX operational environment: - Which Solaris release(s) does it run on? Part of Solaris. - Environment variables? Exit status? Signals issued? Signals caught? (See signal(3HEAD).) N/A -- no changes to existing behavior. - Device drivers directly used (e.g. /dev/audio)? .rc/defaults or other resource/configuration files or databases? None - Does it use any "hidden" (filename begins with ".") or temp files? No. - Does it use any locking files? No. - Command line or calling syntax: Additions to the zonecfg and zoneadm syntax. - Is there support for standard forms, e.g. "-display" for X programs? Are these propagated to sub-environments? N/A - What shared libraries does it use? (Hint: if you have code use "ldd" and "dump -Lv")? In addition to its existing dependencies, zonecfg/zoneadm/zoneadmd might now depend on libsocket and libnsl to be able to fully verify the configuration. - Identify and justify the requirement for any static libraries. N/A - Does it depend on kernel features not provided in your packages and not in the default kernel (e.g. Berkeley compatibility package, /usr/ccs, /usr/ucblib, optional kernel loadable modules)? No. - Is your project 64-bit clean/ready? If not, are there any architectural reasons why it would not work in a 64-bit environment? Does it interoperate with 64-bit versions? Yes. - Does the project depend on particular versions of supporting software (especially Java virtual machines)? If so, do you deliver a private copy? What happens if a conflicting or incompatible version is already or subsequently installed on the system? N/A. - Is the project internationalized and localized? Yes. - Is the project compatible with IPV6 interfaces and addresses? Yes. 12. What is its window/desktop operational environment? N/A -- no graphical components are provided by this project. 13. What interfaces does your project import and export? - Please provide a table of imported and exported interfaces, including stability levels. Pay close attention to the classification of these interfaces in the Interface Taxonomy -- e.g., "Standard," "Stable," and "Evolving;" see: http://sac.sfbay/cgi-bin/bp.cgi?NAME=interface_taxonomy.bp Interfaces Imported Interface Classification Comments zone_key_create Assumed consolidation private Interfaces Exported Interface Classification Comments zonecfg extensions The existing zonecfg syntax ip-type property Committed was labeled as Evolving in [1]. zoneadm extenstions -l list_option Committed -v/-p output format Committed privileges(5) Added PRIV_SYS_IP_CONFIG Committed The privilege names in [2] were all Stable. Internal Interfaces Interface Classification Comments Extensions to zone xml Project Private Introduced by [1] zone_create flags Project Private To tell kernel whether excl or shared-IP zone. zone_add_ifname() Project Private To implement zoneadm list -l, zone_remove_ifname() Project Private ifconfig -a plumb, etc. zone_check_ifname() Project Private zone_get_ifnum() Project Private zone_get_iflist() Project Private netstack_register() Project Private Akin to zone_key_create() netstack_unregister() Project Private but per IP instance. netstack_find_by_cred() Project Private For xx_open lookups etc/ netstack_find_by_stackid()Project Private netstack_hold() Project Private netstack_rele() Project Private netstackid_to_zoneid() Project Private zoneid_to_netstackid() Project Private kstat_create_netstack() Project Private For kstats made visible for kstat_destoy_netstack() Project Private one netstack. netstack_handle_t Project Private For modules that need to netstack_next_init() Project Private walk all netstacks. netstack_next_fini() Project Private netstack_next() Project Private secpolicy_ip_config() Consolidation Private net_register() Consolidation Private Introduced by [3] as net_lookup() Consolidation Private Consolidation Private net_walk() Consolidation Private Adding zoneid argument hook_run() Project Private Introduced by [3] Added argument platform.xml Project Private Introduced by [4] Added optional syntax - Protocols (public or private) None. - Exported public library APIs and ABIs Drag and Drop ToolTalk Cut/Paste N/A - What other applications should it interoperate with? How will it do so? No changes from existing zonecfg/zoneadm - Is it "pipeable"? How does it use stdin, stdout, stderr? No changes from existing zonecfg/zoneadm - Explain the significant file formats, names, syntax, and semantics. As part of extending the zonecfg syntax, the corresponding extensions are made to the xml syntax for the zones description. To handle the brand infrastructure, the platform.xml syntax is extended. - Is there a public namespace? No. - Are the externally visible interfaces documented clearly enough for a non-Sun client to use them successfully? Yes. 14. What are its other significant internal interfaces inter-subsystem and inter-invocation)? - Files - Other - Are the interfaces re-entrant? Yes. 15. Is the interface extensible? How will the interface evolve? - How is versioning handled? Same as existing zonecfg and xml syntax. - What was the commitment level of the previous version? Project private for the xml syntax. - Can this version co-exist with existing standards and with earlier and later versions or with alternative implementations (perhaps by other vendors)? - What are the clients over which a change should be managed? Currently, aside from the zonecfg/zoneadm syntax and new privilege, all key interfaces are private and thus can be revised in lockstep if changes are necessary. - How is transition to a new version to be accomplished? What are the consequences to ISV's and their customers? Just as this project can extend the zonecfg syntax in a compatible way, we envision that other such extensions can be done without any incompatibilities. 16. How do the interfaces adapt to a changing world? Using existing versioning mechanisms where they exist. The project private and consolidation private interfaces can be managed without explicit versioning. 17. Interoperability - If applicable, explain your project's interoperability with the other major implementations in the industry. In particular, does it interoperate with Microsoft's implementation, if one exists? N/A - What would be different about installing your project in a heterogeneous site instead of a homogeneous one (such as Sun)? Nothing. - Does your project assume that a Solaris-based system must be in control of the primary administrative node? The changes are an integral part of Solaris. 18. Performance - How will the project contribute (positively or negatively) to "system load" and "perceived performance"? A zone with an exclusive-IP will result in some additional kernel memory allocation compared to a zone using the shared IP. This is due to there being separate hash tables for tcp, udp, ip, etc per instance. Currently there is no reliable technology to measure the difference between the amount of kernel memory used for a shared-IP zone and an exclusive-IP zone. When multiple exclusive-IP zones are used, the complete separation will potentially result in better cache locality. The performance is evaluated using the usual perf-PIT benchmarks. - What are the performance goals of the project? How were they evaluated? What is the test or reference platform? There were no specific performance goals, but network performance and system utilization must not be measurably worse as a result of this project. - Does the application pause for significant amounts of time? Can the user interact with the application while it is performing long-duration tasks? N/A - What is your project's MT model? How does it use threads internally? How does it expect its client to use threads? If it uses callbacks, can the called entity create a thread and recursively call back? The MT model of the kernel as well as zonecfg/zoneadm/zoneadmd is unchanged. - What is the impact on overall system performance? What is the average working set of this component? How much of this is shared/sharable by other apps? No impact. - Does this application "wake up" periodically? How often and under what conditions? What is the working set associated with this behavior? N/A - Will it require large files/databases (for example, new fonts)? No. - Do files, databases or heap space tend to grow with time/load? What mechanisms does the user have to use to control this? What happens to performance/system load? N/A 19. Please identify any issues that you would like the ARC to address. - Are there issues or related projects that the ARC should advise the appropriate steering committees? 20. Appendices to include [1] PSARC 2002/174 Virtualization and Namespace Isolation in Solaris [2] PSARC 2002/188 Least privilege for Solaris [3] PSARC 2005/334 Packet Filtering Hooks API as modified by http://sac.eng/arc/PSARC/2005/334/opinion.ascii [4] PSARC 2005/471 BrandZ: Support for non-native zones [5] PSARC 2005/707 Surya: Forwarding Performance Enhancement