This project adds basic Ethernet (layer two) bridging support to OpenSolaris. It consists of a Project Private kernel module and daemon, some Project Private SMF properties, and Committed dladm and SMF control interfaces. It is targeted for a Minor release of an OpenSolaris distribution, though we do not believe that any of the changes here require Minor binding. This project assumes that Clearview UV (PSARC 2006/499) will integrate first. The terminology and command line design reflects that assumption. In particular, Clearview obsoletes the idea of "network devices" and instead relies on "links" that may themselves be of varying types. The bridging protocol referred to in this document is the IEEE 802.1D-1998 "Spanning Tree Protocol," abbreviated in this document as "STP." The newer and far more complex "Multiple Spanning Tree Protocol" (802.1Q-2005; MSTP) is intended to be backward compatible with STP, and is not part of this project, but may be the subject of a future project. This document is large, but we believe that the changes described here are straightforward and obvious, given the existing system design, and they've been reviewed by the Clearview and NWAM teams, and thus the changes are suitable for fast-track treatment. 1. Administration All of the administration of this feature is based on dladm and SMF. The SMF portion is the ability to enable and disable bridge instances using the instance URIs described in section 3 below. 1.1 New dladm subcommands These commands are patterned after the existing aggregation commands in dladm. dladm create-bridge [-t] [-R ] [-p ] [-m ] [-h ] [-d ] [-f ] [-l ]... This command creates a bridge instance and optionally assigns network links to the new bridge. By default, no bridge instances are present, and OpenSolaris will not bridge between network links. See the "add-bridge" subcommand for details on link assignment. Bridge creation and link assignment require PRIV_SYS_NET_CONFIG. In order to bridge between links, you must create at least one bridge instance. Each instance is separate: there is intentionally no forwarding connection between bridges. (Note that Crossbow's VNICs may in the future allow virtual inter- bridge connections.) The provided is chosen by the administrator and arbitrary, but must be a legal SMF service instance name. For purposes of documentation, this is a URI component without escape sequences, meaning that the following characters may not be present: ; / ? : @ & = + $ , % < > # " including whitespace and ASCII control characters. The name "default" is reserved, as are all names beginning with the string "SUNW". Names with trailing digits are not permitted, in order to allow for creation of "observability devices;" see section 2 below. Because of the use of the observability devices, the names of legal bridge instances are further constrained to be a legal dlpi(7P) name, which matches: [A-Za-z_][A-Za-z0-9_]*[A-Za-z_] Options are: -t Create a temporary bridge. This will create the bridge object on the running system, but the newly created bridge will not survive the next reboot. -R Specify an alternate root directory. This allows the configuration of bridge instances in alternate roots, as with Live Upgrade and with jumpstart installs. Note that error checking for link type isn't possible when administering an alternate root. -p Specify the Bridge Priority. This sets the STP priority value for determining the root bridge node in the network. The default value is (per the specification) 32768, and legal values are 0 (highest priority) to 65535 (lowest priority). -m Specify the maximum age for configuration information. This sets the STP Bridge Max Age parameter. Information older than this (in seconds) is discarded by all bridges in the network if this node is the root bridge. It defaults to 20.0 seconds. Legal values are from 6.0 to 40.0 seconds. (See the "-d " parameter for additional constraints.) -h Specify the Bridge Hello Time. This sets the STP Bridge Hello Time parameter. If this node is the root node, it sends Configuration BPDUs at this interval throughout the network. It defaults to 2.0 seconds. Legal values are from 1.0 to 10.0 seconds. (See the "-d parameter for additional constraints.) -d Specify the Bridge Forward Delay. This sets the STP Bridge Forward Delay parameter. This timer is used to sequence the link states when a port is enabled anywhere in the network if this node is the root bridge. It defaults to 15.0 seconds. Legal values are from 4.0 to 30.0 seconds. Bridges must obey the following two constraints: 2 * (forward_delay - 1.0) >= max_age max_age >= 2 * (hello_time + 1.0) Any parameter setting that would violate those constraints will be treated as an error and cause the command to fail with a diagnostic message. -f Specify the forced maximum supported protocol. This sets the MSTP maximum supported protocol number. The default is 3. The current implementation doesn't support RSTP or MSTP, so this currently has no effect. However, if the user desires to prevent MSTP from being used in the future when implemented, the parameter may be set to 0 (STP only) or 2 (allow RSTP). -l Add a link to the newly-created bridge. This is equivalent to creating the bridge and then adding one or more links, as with the "add-bridge" option below, except that if any of the links cannot be added, then the entire command fails, and the new bridge itself isn't created. dladm modify-bridge [-t] [-R ] [-p ] [-m ] [-h ] [-d ] [-f ] This subcommand modifies the operational parameters of a given bridge instance. All of the options are the same as for the "create-bridge" subcommand above, except that the "-l" option is not permitted. To add links to an existing bridge, use the "add-bridge" subcommand below. Bridge parameter modification requires PRIV_SYS_NET_CONFIG. dladm delete-bridge [-t] [-R ] This subcommand deletes a bridge instance. Unlike the bridge creation subcommand, which can add links while creating, it does not have the option to remove links during the deletion process. The bridge being deleted must not have any attached links. If it does, then an error is returned and no action is taken. Bridge deletion requires PRIV_SYS_NET_CONFIG. The "-t" and "-R" options are the same as for the "create-bridge" subcommand. dladm add-bridge [-t] [-R ] -l [-l ]... This subcommand adds one or more links to a bridge instance. If multiple links are specified, and adding any one of them results in an error, then no changes are made to the system and the command fails. Link addition to a bridge requires PRIV_SYS_NET_CONFIG. A link may be a member of at most one bridge. It's an error to specify that a link belongs to more than one bridge. To move a link from one bridge instance to another, remove it from the current bridge before adding it to the new one. The links assigned to a bridge must not themselves be VLANs or tunnels. Only links that would be acceptable as part of an aggregation or links that are aggregations themselves may be assigned to a bridge. Other link types will result in error messages, and no action taken. (A future project may provide bridging over tunnels using GRE, and over PPP using BCP. Those cases are not part of this project, but nothing this project is doing will preclude those cases from the future.) In this initial version, the links must also be Ethernet type. Bridging is well-defined over a few other media, and there are some dodgy ways to make it work on still others, but those cases are subjects for a future release. When links are added to a bridge, the bridging protocol in use (STP) will be notified, and the links will behave as though just created. For STP, this means that the link will be shut down and then brought back up using the standard protocol. The options are the same as for the "create-bridge" subcommand. dladm remove-bridge [-t] [-R ] -l [-l ]... This subcommand removes one or more links from a bridge instance. If multiple links are specified, and removing any one of them would result in an error, then none are removed and the command fails. Link removal from a bridge requires PRIV_SYS_NET_CONFIG. When links are removed from a bridge, the bridging protocol (STP) is notified, and will likely recalculate a new network topology, unless those links were unused due to loop-pruning activity by the bridging protocol. The options are the same as for the "create-bridge" subcommand. dladm show-bridge [-p] [-o field,...] [-s [-i ]] [] This subcommand shows the running status of bridges. When given a bridge name, it shows the status of that one bridge. If no bridge name is given, then it shows summary status of all bridges on the system. The '-o' option allows the user to specify a comma-separated case-insensitive list of fields to display. The field name may "all" to display all fields, or any combination of: BRIDGE Assigned name of the bridge (same as , if provided) BRIDGEID Bridge Identifier value (MAC + priority) PRIORITY Configured priority value (-p) BMAXAGE Configured bridge maximum age (-m) BHELLOTIME Configured bridge hello time (-h) BFWDDELAY Configured forwarding delay (-d) FORCEPROTO Configured forced maximum protocol (-f) TCTIME Time since last topology change in seconds TCCOUNT Count of the number of topology changes TCHANGE Topology change detected ("yes" or "no") DESROOT Bridge Identifier of the root node (MAC + priority) ROOTCOST Cost of the path to the root node ROOTPORT Port used to reach root node MAXAGE Maximum age value from root node HELLOTIME Hello time value from root node FWDDELAY Forward delay value from root node HOLDTIME Minimum BPDU interval Note the lack of a "-R" option here. It is not possible to list bridge configuration information in an alternate root, in keeping with the rest of the dladm user interface. The reason for this restriction is to allow the data to be represented in SMF, where "writing" to an alternate root is supported by way of copying appropriate commands to $ROOT/var/svc/profile/upgrade, but "reading" is not feasible because the repository on the alternate root may be incompatible with the running system. dladm show-bridge -P [-p] [-o field,...] [-s [-i ]] This variant of the show-bridge subcommand displays port-related information for a single bridge instance. Note that configured parameters are shown through show-linkprop. The relevant field names for the "show-bridge -P" subcommand are: PORT Link name STATE "disabled", "listening", "learning", "forwarding", or "blocking" UPTIME Number of seconds since last reset or initialize DESROOT Root Bridge Identifier (MAC + priority) seen on this port DESCOST Path cost to root node through designated port DESBRIDGE Bridge Identifier (MAC + priority) DESPORT Port ID and priority of port used to transmit configuration messages for this port TCACK Topology Change Acknowledge flag ("yes" or "no") 1.2 New dladm Link Properties These may be used with the existing dladm set-linkprop, reset-linkprop, and show-linkprop subcommands. "stp" This is a boolean property. It defaults to "true." When set to "false," the link will not use Spanning Tree, and will be placed into forwarding mode at all times. The "false" setting is appropriate for point-to-point links connected to end nodes. Only non-VLAN type links have this property. "forward" This is a boolean property on all links. It defaults to "true." When set to "false," the VLAN associated with the link instance will not forward traffic through the bridge. Setting the property to "false" is equivalent to removing the VLAN from the "allowed set" for a traditional bridge. "default-tag" This is a numeric property with range 0 to 4094. It defaults to 1. It defines the default VLAN ID that's assumed for untagged packets sent to and received from this link. Only non-VLAN type links have this property. "stp-priority" This is a numeric property with range 0 to 255. It defaults to 128. It corresponds to the STP Port Priority value, which is used to determine the preferred root port on a bridge by prepending to the port identifier. Lower numerical values are higher priority. "stp-cost" This is a numeric property with range 1 to 65535; zero is not allowed. It represents the cost for using the link, and defaults to (per the standard) 100 for 10Mbps, 19 for 100Mbps, 4 for 1Gbps, and 2 for 10Gbps. "bridge-port" This is a read-only numeric property. It shows the port number for the link as seen by the bridge, and is used in Spanning Tree messages and network management. 1.3 New Kstats Each bridge instance will have a set of statistics, named "bridge:::", where: Arbitrary instance number assigned by the kernel and not necessarily retained across reboot. Administrator-specified bridge name. Name of statistic; at least the following: learn_source Number of sources learned learn_expire Number of learnt entries expired learn_size Current count of learnt entries forward_direct Directly forwarded packet count forward_unknown Forwarded with unknown destination forward_mbcast Forwarded multicast/broadcast Each link instance will also have new kstats, where the names will be: bridge_sent Packets forwarded to the link by bridging bridge_rcvd Packets received from the link (and forwarded elsewhere) by bridging All of these statistics are considered Volatile for now. The existence of the statistics will be documented for users, but with warnings that the names and definitions of the statistics may change incompatibly. A future case for the overall RBridges project will elevate these in stability. 2. Packet Observability Each bridge instance will be assigned an "observability device," in a manner similar to the DLPI nodes created for "Clearview: IP Observability Devices" (PSARC 2006/475). These nodes will appear under the /dev/bridge/ directory, named by the bridge name plus a trailing "0". The observability node is intended for use with snoop and wireshark. It behaves as a standard Ethernet interface, but does not permit the transmission of packets. All transmitted packets are silently dropped. The user of this node will get a single unmodified copy of every packet handled by the bridge, similar to a "monitoring" port on a traditional bridge, and subject to the usual DLPI "promiscuous mode" rules. The user may also filter on VLAN ID by using the VLAN PPA hack mechanism: "/dev/bridge/my-bridge1000" selects VLAN ID 1 on bridge the instance named "my-bridge". The observability node also forms a Project Private control node for the kernel, allowing ioctls to a specific bridge instance, and will be used by the STP daemon and other (future) bridging protocols. The dlpi_open(3DLPI) interface will be enhanced with a Committed DLPI_BRIDGE flag to allow applications to locate the observability nodes by name. 3. STP Daemon Each bridge (created via "dladm create-bridge") is represented as an identically-named SMF instance of svc:/network/bridge. Each instance runs a copy of /usr/lib/bridged, which implements the Spanning Tree Protocol (STP). For example, if the user runs: # dladm create-bridge my-bridge The system will have an SMF service named: svc:/network/bridge:my-bridge and (per section 2 above) an observability node named: /dev/bridge/my-bridge0 By default, all ports run standard STP. This is done for safety reasons: a bridge that does not run some form of bridging protocol (such as STP) can form long-lasting forwarding loops in the network. Because Ethernet has no hop-count or TTL on packets, any such loops are fatal to the network. When the adminstrator knows that a particular port is not connected to another bridge (for example, a direct point-to-point connection to a host system), STP can be disabled administratively for that port. Even if all ports on a bridge have STP disabled, the STP daemon still runs; this is in case new ports are added, and because it is responsible for enabling and disabling forwarding on the ports. If the SMF service instance for a bridge is disabled, then bridge forwarding stops on those ports as the STP daemon is stopped. If the instance is restarted, STP starts from its initial state. The bridge daemon runs as UID/GID "daemon" with PRIV_SYS_NET_CONFIG in order to access the raw network devices, but with most other basic privileges (e.g., PRIV_PROC_FORK and PRIV_PROC_EXEC) removed. 3. VLANs In general, administrators will want to have the VLANs they configure on the system to be forwarded among all the ports on a bridge instance, so this will be the default for VLANs. When the administrator invokes Clearview's "dladm create-vlan", and the underlying link is part of a bridge, that command will also enable forwarding of the specified VLAN on that bridge link. If an administrator wants to configure a VLAN on a link but not allow forwarding to or from other links on the bridge, then he must take specific action to do so, by disabling forwarding with "set-linkprop". Clearview UV provides two mechanisms for the creation of VLANs. The primary means of configuration is the new "dladm create-vlan" subcommand, which automatically enables the VLAN for bridging as described above, if the underlying link is configured as part of a bridge. The second mechanism is a legacy feature called the "PPA hack." This allows a user to create a VLAN simply by opening a DLPI provider and specifying a VLAN ID number as part of the PPA. In this case, the user may be doing nothing other than snooping on that VLAN, so adding the VLAN to the allowed set automatically is likely not the right answer. Thus, we will default forwarding to "off" for PPA-hack VLANs. Administrators with legacy PPA hack VLANs will need to reconfigure to use the new Clearview VLANs to take full advantage of bridging, and this will be included in the documentation. In STP, VLANs are ignored. The bridging protocol computes just one loop-free topology and uses that. Administrators are required to configure any "duplicate" links such that when they're automatically disabled by STP, the configured VLANs are not disconnected. MSTP is somewhat similar, but allows administrators to assign each VLAN to a small number of distinct spanning tree "instances," and allows instances within an identically-configured "region" to have distinct topologies. In terms of this project, additional bridge and link properties would be required to enable MSTP operation. 4. SMF Properties These parameters are all Project Private. They will not be documented, and the documented administrative interface will be the dladm command. 4.1 STP SMF Property Name Type Default -------------- ---- ------- config/priority ushort_t 32768 config/max-age ushort_t 5120 (20 seconds) config/hello-time ushort_t 512 (2 seconds) config/forward-delay ushort_t 3840 (15 seconds) config/force-protocol int 3 All of these properties (and their default values and granularities) are defined by the STP and related standards. The "force-protocol" parameter is specified to allow for an upgrade path. Users who do not want to see the use of MSTP when it is implemented can set this parameter to 0 or 2 (as specified in IEEE 802.1Q-2004) to select STP or RSTP as the maximum allowed protocol. In this project, the parameter will have no effect, as only STP is implemented. 4.2 Datalink SMF Property Name Type Default -------------- ---- ------- config/stp boolean true config/forward boolean true config/bridge string "" config/default-tag ushort_t 1 On a Nemo device, legacy device, or aggregation, the link parameters are used as above. The "default-tag" parameter may be set to 0 to disable the forwarding of untagged packets to and from the port. On a VLAN, "stp" and "default-tag" are ignored. The "forward" flag enables forwarding for that VLAN, which is equivalent to putting the VLAN into the "allowed set" for the bridge port. Setting it to "false" causes the VLAN to be disallowed, which means that VLAN-based I/O to the underlying link still operates, but no bridge-based forwarding is done. The "bridge" parameter is reserved for use with MSTP, where it will select an instance. 5. Alternatives 5.1 Using A Separate Command An alternative command set design would be to create a new bridge control command (bridgeadm), rather than using dladm. The main problem with this separation is that the configuration of the bridge would end up being split between two different utilities in a somewhat incoherent manner. Why would IEEE 802 aggregations be part of dladm but IEEE 802 bridges be configured elsewhere? Parts of the configuration of a bridge (such as the set of allowed VLANs and the default VLAN tag for a given link) are naturally part of the link configuration, and not a common property of the bridge. The creation of VLANs (logically located "above" links and bridges) and regular Ethernet links (logically located "below" VLANs and bridges) via dladm while bridging itself is in bridgeadm seems like a very strange result. We could create a separate bridgeadm, but then we'd likely have to deal with the VLAN issues some other way. Most likely, we would end up with either duplicate configuration in bridgeadm or the bulk of bridge configuration actually going on in dladm per-link properties, and only bridge create/destroy done via bridgeadm. In other words, there are several IEEE-specified parameters for bridges, but they're rarely of much interest, so that proposed utility wouldn't do very much. The main thing users need to manipulate for bridges are the VLANs, and we need to figure out how to represent that manipulation. We choose to equate dladm- created-VLAN with bridge-allowed-VLAN because it seems to produce the most natural results: there's only one way to "create" or "destroy" a VLAN in the system. The alternative is to break those apart, and allow users to create VLANs for potential use with IP via dladm, and separately assign VLANs to bridge ports via bridgeadm, but that runs the very likely risk of misconfiguration: either forgetting to enable a bridge link for a VLAN while having IP plumbed atop, or thinking that destroying the VLAN removes it from the bridge. Since neither scenario seems to be particularly useful, allowing for them doesn't seem like a good goal. Or, for a really short answer: dladm is the location of all things datalinkish, and bridging is (like VLANs and aggregations) a datalink function. 5.2 Link Configuration Storage Alternative designs for the configuration information include having the set of links for a bridge listed as part of the bridge configuration, and using non-SMF files for storing configuration. The former approach would work, and would have the advantage that during start-up of the STP daemon it would be easy to find the list of links configured for that instance. That's a benefit over the proposed design in that we will need to iterate over all links to get the list needed for a single instance. However, there are two reasons this approach wasn't chosen: a. A link may be a member of at most one bridge. This semantic is easy to enforce with a link property, as there's just one instance of the property, but is hard to enforce across multiple bridges. We end up needing to scan all bridge instances, and configuration transactions become more complex because two objects need to be changed at one time. b. We want to have all configuration parameters for a link to be stored with the link itself. Having parameters stored elsewhere in the system means that utilities that manipulate links or just display system configuration may end up needing to scan through these other locations in order to make coherent system changes. (For this project, we would be forced to change the existing Clearview "dladm delete-link" functionality so that it scanned the bridge instances and removed any links found there. Storing the data with the link instance removes that requirement.) Using non-SMF files would also work, and we could make use of the Clearview UV "link IDs" to avoid problems inherent with link renaming. However, longer term, the Clearview and NWAM teams are refactoring link configuration into SMF. Having native bridging designed for OpenSolaris but not actually integrated with its core administrative mechanisms seems like a poor recipe for the future. 6. Interface Summary Interface Stability Comments --------- --------- -------- dladm *-bridge Committed field names Committed dladm show-bridge -o link properties Committed kstats Volatile Should be raised later /dev/bridge/ Committed Observability node control ioctls Project Private /usr/lib/bridged Project Private /network/bridge Committed SMF URI config/* Project Private SMF properties bridge module Project Private Kernel bridging module DLPI_BRIDGE Committed dlpi_open(3DLPI) /var/run/bridge_door/ Project Private Doors interface to daemons librstp.so.1 Project Private RSTP implementation