0.  Introduction

    This project adds basic Ethernet (layer two) bridging support to
    OpenSolaris.  It consists of a Project Private kernel module and
    daemon, some Project Private SMF properties, and Committed dladm
    and SMF control interfaces.  It is targeted for a Minor release of
    an OpenSolaris distribution; the change to the default "dladm
    show-link" output that causes Minor binding is detailed below.

    The architecture described in this document is based on the
    Clearview UV (PSARC 2006/499) terminology and dladm command-line
    design.  In particular, Clearview obsoletes the idea of "network
    devices" and instead relies on "links" that may themselves be of
    varying types.

    The bridging protocol referred to in this document includes both
    the IEEE 802.1D-1998 "Spanning Tree Protocol," abbreviated in this
    document as "STP," and the IEEE 802.1Q-2004 "Rapid Spanning Tree
    Protocol," abbreviated as "RSTP."

    The newer and far more complex "Multiple Spanning Tree Protocol"
    (802.1Q-2005; MSTP) is intended to be backward compatible with
    STP.  However, it is not part of this project, and may be the
    subject of a future project.


1.  Administration

    All of the administration of this feature is based on dladm and
    SMF.  The SMF portion provides the ability to enable, disable, and
    monitor bridge instances using the instance URIs described in
    section 3 below.  The dladm portion creates and destroys bridges
    and assigns links to and removes links from them.

1.1 New dladm subcommands

    These commands are patterned after the existing aggregation
    commands in dladm, but without the "-t" support, as SMF doesn't
    adequately support temporary instances.  (If required later,
    temporary instance and parameter support could be added.)

    dladm create-bridge [-R <root-dir>] [-p <priority>] [-m <max-age>]
      [-h <hello-time>] [-d <forward-delay>] [-f <force-protocol>]
      [-l <link>]... <bridge-name>

      This command creates a bridge instance and optionally assigns
      one or more network links to the new bridge.  By default, no
      bridge instances are present on the system, and OpenSolaris will
      thus not bridge between network links by default.

      In order to bridge between links, you must create at least one
      bridge instance.  Each bridge instance is separate: there is
      intentionally no forwarding connection between bridges, and a
      link is a member of at most one bridge.

      Note that a pair of internal bridges that are somehow
      interconnected are actually equivalent to a single larger bridge
      instance, so such a configuration should never be needed in
      ordinary practice.  (These cases may be created using special
      test tools, however.)

      The <bridge-name> provided is chosen by the administrator and
      arbitrary, but must at least be a legal SMF service instance
      name.  For purposes of documentation, this is a URI component
      without escape sequences, meaning that the following characters
      may not be present:

	; / ? : @ & = + $ , % < > # "

      including whitespace and ASCII control characters.  The name
      "default" is reserved, as are all names beginning with the
      string "SUNW".  Names with trailing digits are not permitted, in
      order to allow for creation of "observability devices."  (For
      more about "observability," see section 2 below.)

      Because of the use of the observability devices, the names of
      legal bridge instances are further constrained to be a legal
      dlpi(7P) name, which matches:

	[A-Za-z_][A-Za-z0-9_]*[A-Za-z_]

      Names not matching that pattern will cause the command to fail
      and report "illegal name" to the user.

      Options are:

      -R <root-dir>
		Specify an alternate root directory.

		This allows the configuration of bridge instances in
		alternate roots, as with Live Upgrade and with
		jumpstart installs.  Note that error checking for link
		type isn't possible when administering an alternate
		root, because the definition of the link itself may
		exist only in deferred commands in that alternate
		root.

      -p <priority>
		Specify the Bridge Priority.

		This sets the STP priority value for determining the
		root bridge node in the network.  The default value is
		(per the specification) 32768, and legal values are 0
		(highest priority) to 61440 (lowest priority), in
		increments of 4096.  (This granularity is required per
		section 9.2.5 of IEEE 802.1D-2004; the lower 12 bits
		are now used for MSTP instances and treated as an ID
		extension.)

		If a value with any of the lower 12 bits set is used,
		then the system will silently ignore those bits and
		round downward to the next lower value divisible by
		4096.

      -m <max-age>
		Specify the maximum age for configuration information.

		This sets the STP Bridge Max Age parameter.
		Information older than this (in seconds) is discarded
		by all bridges in the network if this node is the root
		bridge.  It defaults to 20 seconds.  Legal values are
		from 6 to 40 seconds.  (See the "-d <forward-delay>"
		parameter for additional constraints.)

      -h <hello-time>
		Specify the Bridge Hello Time.

		This sets the STP Bridge Hello Time parameter.  If
		this node is the root node, it sends Configuration
		BPDUs at this interval throughout the network.  It
		defaults to 2 seconds.  Legal values are from 1 to 10
		seconds.  (See the "-d <forward-delay> parameter for
		additional constraints.)

      -d <forward-delay>
		Specify the Bridge Forward Delay.

		This sets the STP Bridge Forward Delay parameter.
		This timer is used to sequence the link states when a
		port is enabled anywhere in the network if this node
		is the root bridge.  It defaults to 15 seconds.  Legal
		values are from 4 to 30 seconds.

		Bridges must obey the following two constraints:

			2 * (forward_delay - 1.0) >= max_age

			max_age >= 2 * (hello_time + 1.0)

		Any parameter setting that would violate those
		constraints will be treated as an error and cause the
		command to fail with a diagnostic message.

      -f <force-protocol>
		Specify the forced maximum supported protocol.

		This sets the MSTP maximum supported protocol number,
		and must be a non-negative.  The default is 3.  The
		current implementation doesn't support RSTP or MSTP,
		so this currently has no effect.  However, if the user
		wishes to prevent MSTP from being used in the future
		when implemented, the parameter may be set to 0 (STP
		only) or 2 (allow STP or RSTP).

      -l <link>	Add a link to the newly-created bridge.

		This is equivalent to creating the bridge and then
		adding one or more links, as with the "add-bridge"
		option below, except that if any of the links cannot
		be added, then the entire command fails, and the new
		bridge itself isn't created.  The option is repeated
		to add multiple links at once.  Bridges may be created
		without links if desired.

      See the "add-bridge" subcommand for details on link assignment.

      Bridge creation and link assignment require PRIV_SYS_DL_CONFIG.

    dladm modify-bridge [-R <root-dir>] [-p <priority>] [-m <max-age>]
      [-h <hello-time>] [-d <forward-delay>] [-f <force-protocol>]
      <bridge-name>

      This subcommand modifies the operational parameters of a given
      bridge instance.  All of the options are the same as for the
      "create-bridge" subcommand above, except that the "-l" option is
      not permitted.  To add links to an existing bridge, use the
      "add-bridge" subcommand below.

      Bridge parameter modification requires PRIV_SYS_DL_CONFIG.

    dladm delete-bridge [-R <root-dir>] <bridge-name>

      This subcommand deletes a bridge instance.  Unlike the bridge
      creation subcommand, which can add links while creating, the
      delete subcommand does not have the option to remove links
      during the deletion process.  The bridge being deleted must not
      have any attached links.  If it does, then an error is returned
      and no action is taken.  The user must use "remove-bridge" first
      to deactivate the links.

      Bridge deletion requires PRIV_SYS_DL_CONFIG.

      The "-R" option is the same as for the "create-bridge"
      subcommand.

    dladm add-bridge [-R <root-dir>] -l <link> [-l <link>]... <bridge-name>

      This subcommand adds one or more links to a bridge instance.  If
      multiple links are specified, and adding any one of them results
      in an error, then no changes are made to the system and the
      command fails.

      Link addition to a bridge requires PRIV_SYS_DL_CONFIG.

      A link may be a member of at most one bridge.  It's an error to
      attempt to add a link that already belongs to another bridge.
      To move a link from one bridge instance to another, remove it
      from the current bridge before adding it to a new one.

      The links assigned to a bridge must not themselves be VLANs,
      VNICs, or tunnels.  Only links that would be acceptable as part
      of an aggregation or links that are aggregations themselves may
      be assigned to a bridge.  Other link types will result in error
      messages, and no action taken.  (A future project may provide
      bridging over tunnels using GRE, and over PPP using BCP.  Those
      features are not part of this project, but nothing this project
      is doing will preclude those features from being supported in
      the future.)

      Links assigned to a bridge must all have the same MTU.  This is
      checked when the link is assigned, and the link will be rejected
      if it is not the first link on the bridge and has a differing
      MTU in order to avoid inadvertent errors.  Note that Solaris
      also allows the MTU on a link to be changed on an existing link.
      In this case, we will log an error and the bridge instance will
      go into maintenance state.  The user may then remove or change
      the assigned links so that the MTU matches and then restart.

      In this initial version, the links must also be Ethernet type,
      which includes 802.3 and 802.11 media.  Bridging is well-defined
      over a few other media, and there are some dodgy ways to make it
      work on still others, but those cases are subjects for a future
      release.

      (It is remotely possible that there may be some drivers that do
      not permit transmission of frames with a user-chosen source MAC
      address.  None have been found yet, but if any are found in
      testing, these will be listed in the documentation as
      unsupported drivers.)

      When links are added to a bridge, the bridging protocol in use
      (STP) will be notified, and the links will behave as though just
      created.  For STP, this means that the link will be shut down
      and then brought back up using the standard protocol.

      The options are the same as for the "create-bridge" subcommand.

    dladm remove-bridge [-R <root-dir>] -l <link> [-l <link>]... <bridge-name>

      This subcommand removes one or more links from a bridge
      instance.  If multiple links are specified, and removing any one
      of them would result in an error, then none are removed and the
      command fails.

      Link removal from a bridge requires PRIV_SYS_DL_CONFIG.

      When links are removed from a bridge, the bridging protocol
      (STP) is notified, and will likely recalculate a new network
      topology, unless those links were unused due to loop-pruning
      activity by the bridging protocol.

      The options are the same as for the "create-bridge" subcommand.

    dladm show-bridge [-p] [-o <field>,...] [<bridge-name>]

      This subcommand shows the running status and configuration of
      bridges.  When given a bridge name, it shows the status of that
      one bridge.  If no bridge name is given, then it shows summary
      status of all bridges on the system.

      The '-o' option allows the user to specify a comma-separated
      case-insensitive list of fields to display.  The field name may
      "all" to display all fields, or any combination of:

	BRIDGE		Assigned name of the bridge (same as
			<bridge-name>, if provided)
	ADDRESS		Bridge Unique Identifier value (MAC)
	PRIORITY	Configured priority value (-p)
	BMAXAGE		Configured bridge maximum age (-m)
	BHELLOTIME	Configured bridge hello time (-h)
	BFWDDELAY	Configured forwarding delay (-d)
	FORCEPROTO	Configured forced maximum protocol (-f)
	TCTIME		Time since last topology change in seconds
	TCCOUNT		Count of the number of topology changes
	TCHANGE		Topology change detected ("yes" or "no")
	DESROOT		Bridge Identifier of the root node (MAC + priority)
	ROOTCOST	Cost of the path to the root node
	ROOTPORT	Port used to reach root node
	MAXAGE		Maximum age value from root node
	HELLOTIME	Hello time value from root node
	FWDDELAY	Forward delay value from root node
	HOLDTIME	Minimum BPDU interval

      The default set of fields when -o is not specified is "BRIDGE,"
      "ADDRESS," "PRIORITY," and "DESROOT."

      Note the lack of a "-R" option for show-bridge.  It is not
      possible to list bridge configuration information in an
      alternate root, in keeping with the rest of the dladm user
      interface.  The reason for this restriction is to allow the data
      to be represented in SMF, where "writing" to an alternate root
      is supported by way of copying appropriate commands to
      $ROOT/var/svc/profile/upgrade, but "reading" is not feasible
      because the repository on the alternate root may be incompatible
      with the running system.

    dladm show-bridge -l [-p] [-o <field>,...] <bridge-name>

      This variant of the show-bridge subcommand displays link-related
      status information for a single bridge instance.  Note that
      configured parameters are shown through show-linkprop.  The
      relevant field names for the "show-bridge -l" subcommand are:

	LINK		Link name
	INDEX		Port (link) index number on the bridge
	STATE		"disabled", "listening", "learning",
			"forwarding", or "blocking"
	UPTIME		Number of seconds since last reset or initialize
	OPERCOST	Actual cost in use (1-65535)
	OPERP2P		P2P mode flag ("yes" or "no")
	OPEREDGE	Edge mode flag ("yes" or "no")
	DESROOT		Root Bridge Identifier (MAC + priority) seen
			on this port
	DESCOST		Path cost to root node through designated port
	DESBRIDGE	Bridge Identifier (MAC + priority)
	DESPORT		Port ID and priority of port used to transmit
			configuration messages for this port
	TCACK		Topology Change Acknowledge flag ("yes" or "no")

      The default set of fields when -o is not specified is "LINK,"
      "STATE," "UPTIME," and "DESROOT."

    dladm show-bridge -s [-p] [-o <field>,...] [-i <interval>]
      [<bridge-name>]

      This variant shows statistics for the bridge given, or, if no
      bridge name is supplied, then for all bridges in the system.
      The relevant field names are:

	BRIDGE		Bridge name
	DROPS		Number of packets dropped due to resource problems
	FORWARDS	Number of packets forwarded to another link
	RECV		Number of packets received on all links
	SENT		Number of packets sent on all links
	UNKNOWN		Number of packets with unknown destination;
			sent to all links

      The default set of fields when -o is not specified is "BRIDGE,"
      "DROPS," and "FORWARDS."

    dladm show-bridge -ls [-p] [-o <field>,...] [-i <interval>]
      <bridge-name>

      This variant shows statistics for all of the links on the bridge
      named.  The relevant field names are:

	LINK		Link name
	CFGBPDU		Number of configuration BPDUs received
	TCNBPDU		Number of topology change BPDUs received
	RSTPBPDU	Number of Rapid Spanning Tree BPDUs received
	TXBPDU		Number of BPDUs transmitted
	DROPS		Number of packets dropped due to resource problems
	RECV		Number of packets received by bridge
	XMIT		Number of packets sent by bridge

      The default set of fields when -o is not specified is "LINK,"
      "DROPS," "RECV," and "XMIT."

1.2 New dladm Link Properties

    These may be used with the existing dladm set-linkprop,
    reset-linkprop, and show-linkprop subcommands.  Note the use of
    underscores in the names; this is to match the existing variable
    naming practice among dladm properties.

    "stp"

	This is a boolean property.  It defaults to 1 (true), which
	enables STP and RSTP.  When set to 0 (false), the link will
	not use any type of Spanning Tree, and will be placed into
	forwarding mode (with BPDU guarding) at all times.  The
	"false" setting is appropriate for point-to-point links
	connected to end nodes.  Only non-VLAN, non-VNIC type links
	have this property.

    "forward"

	This is a boolean property on all but VNIC links.  It defaults
	to 1 (true).  When set to 0 (false), the VLAN associated with
	the link instance will not forward traffic through the bridge.
	Setting the property to "false" is equivalent to removing the
	VLAN from the "allowed set" for a traditional bridge, which
	means that VLAN-based I/O to the underlying link from local
	clients still operates, but no bridge-based forwarding is
	done.

    "default_tag"

	This is a numeric property with range 0 to 4094.  It defaults
	to 1.  It defines the default VLAN ID that's assumed for
	untagged packets sent to and received from this link.  Only
	non-VLAN, non-VNIC type links have this property.  Setting
	this value to 0 disables the forwarding of untagged packets to
	and from the port.

    "stp_priority"

	This is a numeric property with range 0 to 255.  It defaults
	to 128.  It corresponds to the STP and RSTP Port Priority
	value, which is used to determine the preferred root port on a
	bridge by prepending to the port identifier.  Lower numerical
	values are higher priority.

    "stp_cost"

	This is a numeric property with range 1 to 65535; zero is not
	allowed by the standard, and is used to signal "auto"
	(default) cost computed by link type.  It represents the STP
	and RSTP cost for using the link, and is equal to (per the
	standard) 100 for 10Mbps, 19 for 100Mbps, 4 for 1Gbps, and 2
	for 10Gbps.

    "stp_edge"

	This is a boolean property.  It defaults to 1 (true).  If set
	to 0 (false), the daemon will assume that the port is
	connected to other bridges even if no bridge PDUs of any type
	are seen.

    "stp_p2p"

	This is an enumerated value.  Legal values are "true",
	"false", and "auto".  When set to "auto" (the default),
	point-to-point connections are automatically discovered.
	Otherwise, the port mode is forced to point-to-point mode (for
	"true") or normal multipoint mode (for "false").

1.3 New Kstats

    Each bridge instance will have a set of statistics, named
    "bridge:<index>:<bridge-name>0:<statistic>", where:

	<index>
		Arbitrary instance number assigned by the kernel and
		not necessarily retained across reboot.

	<bridge-name>
		Administrator-specified bridge name.

	<statistic>
		Name of statistic; at least the following:

		learn_source	Number of sources learned
		learn_expire	Number of learnt entries expired
		learn_size	Current count of learnt entries
		forward_direct	Directly forwarded packet count
		forward_unknown	Forwarded with unknown destination
		forward_mbcast	Forwarded multicast/broadcast

    Each link instance will also have new kstats, named
    "bridge:<index>:<bridge-name>0-<link-name>:<statistic>", where the
    <statistic> names will be:

	xmit	Packets forwarded to the link by bridging
	rcvd	Packets received from the link (and forwarded
		elsewhere) by bridging

    All of these statistics are considered Volatile for now.  The
    existence of the statistics will be documented for users, but with
    warnings that the names and definitions of the statistics may
    change incompatibly.  A future case for the overall RBridges
    project will elevate these in stability.

1.4 dladm show-link Changes

    A new "BRIDGE" field is added to the "dladm show-link" output.  If
    a link is a member of a bridge, then this field identifies the
    name of the bridge of which it's a member.  This field is shown by
    default, right before the larger "OVER" field.  For links that are
    not part of a bridge, the field is displayed as a blank string (if
    parseable output is selected) or as "--" if non-parseable.

    The addition of the "BRIDGE" field in the default output format
    may require Minor release binding.  The utility is intended to be
    used in parseable mode when run in cases where the output format
    matters, and in these cases the changes proposed here are
    compatible, but we should still err on the side of caution.

    The bridge observability node also appears in the "dladm
    show-link" output as a separate link.  For this node, the existing
    "OVER" field will list the links that are members of the bridge.


2.  Kernel Features

2.1 Packet Observability

    Each bridge instance will be assigned an "observability device,"
    in a manner similar to the DLPI nodes created for "Clearview: IP
    Observability Devices" (PSARC 2006/475).  These nodes will appear
    under the /dev/bridge/ directory, named by the bridge name plus a
    trailing "0".

    The observability node is intended for use with snoop and
    wireshark.  It behaves as a standard Ethernet interface, but does
    not permit the transmission of packets.  All transmitted packets
    are silently dropped.  It's not possible to plumb IP on top;
    attempts to do DL_BIND_REQ without using the passive option will
    fail.

    The user of this node will get a single unmodified copy of every
    packet handled by the bridge, similar to a "monitoring" port on a
    traditional bridge, and subject to the usual DLPI "promiscuous
    mode" rules.  Filtering on VLAN ID is accomplished by the use of
    pfmod(7M) or features in snoop and wireshark; the VLAN PPA hack
    mechanism (PSARC 2000/147) is not supported.  (Note that Crossbow
    [PSARC 2006/357] has withdrawn support for the VLAN PPA hack.)

    The packets delivered will represent the data received by the
    bridge.  In the cases where the bridging process will add, remove,
    or modify a VLAN tag, the data shown will be before this process
    takes place, which may be confusing if there are distinct
    default_tag values used on different links.  This isn't often the
    case, but it's an important caution.

    To see the packets transmitted and received on a particular link
    (after the bridging process is complete), snoop on the individual
    links rather than the bridge observability node.

    Due to the vanity naming support in Clearview, no special changes
    are needed to dlpi_open(3DLPI) to make it work with these
    observability nodes.  They "just work."

2.2 DLPI Behavior

    When a bridge is enabled on a datalink, the link behaves slightly
    differently in order to accomodate bridging behavior.

    a. Link up/down (DL_NOTE_LINK_{UP,DOWN}) are delivered in the
       aggregate.  This means that when all external links are showing
       link-down status, the upper-level clients using the MAC layers
       will see link-down events as well.  When any external link on
       the bridge shows link-up status, all upper-level clients see
       link-up.

       There are several reasons for this behavior.  When link-down is
       seen, it means that nodes on the link are no longer reachable.
       That is no longer true when the bridging code can still send
       and receive packets through another link.  Administrative
       applications that need the actual status of links can use the
       existing MAC-layer kstats to reveal the status.  These
       applications are unlike ordinary clients (such as IP) in that
       they report hardware status information and do not get involved
       in forwarding.

       In the case where all external links are down, we let the
       status show through as though the bridge itself were shut down.
       In this special case, we allow the system to recognize that
       nothing could possibly be reachable.  The trade-off is that
       bridges can't be used to allow local-only communication in the
       case where all interfaces are "real" (not virtual) and all are
       disconnected.

       (This could be made an option in the future if desired; the
       result would be that bridge links, like VNICs, are always
       "running.")

    b. All link-specific features are made generic.  Links that
       support special hardware acceleration features will be unable
       to use those features because actual output link determination
       is not made entirely by the client: the bridge forwarding
       function has to choose an output link based on the destination
       MAC address, and this can be any link on the bridge.

       It may be possible in the future to handle various acceleration
       modes with bridging enabled.  Doing so would mean either
       emulating the acceleration logic on links that lack it, or
       exposing the per-L2-destination nature of the behavior to MAC
       clients.  Such extensions are not part of this project and not
       currently planned.

       One reason we are not planning to support these features is
       that enabling bridging fundamentally requires that the
       interfaces all be placed into promiscuous mode.  In that mode,
       the system must handle all packets on the wire, and most
       hardware devices disable optimizations as this is the "slow
       mode."


3.  STP Daemon

    Each bridge (created via "dladm create-bridge") is represented as
    an identically-named SMF instance of svc:/network/bridge.  Each
    instance runs a copy of /usr/lib/bridged, which implements the
    Spanning Tree Protocol (STP).  For example, if the user runs:

	# dladm create-bridge mybridge

    The system will have an SMF service named:

	svc:/network/bridge:mybridge

    and (per section 2 above) an observability node named:

	/dev/bridge/mybridge0

    By default, all ports run standard STP.  This is done for safety
    reasons: a bridge that does not run some form of bridging protocol
    (such as STP) can form long-lasting forwarding loops in the
    network.  Because Ethernet has no hop-count or TTL on packets, any
    such loops are fatal to the network.

    When the administrator knows that a particular port is not
    connected to another bridge (for example, a direct point-to-point
    connection to a host system), STP can be disabled administratively
    for that port.  Even if all ports on a bridge have STP disabled,
    the STP daemon still runs; this is in case new ports are added,
    for implementation of BPDU guarding, and because the daemon is
    responsible for enabling and disabling forwarding on the ports.

    When a port has STP disabled, the daemon will still listen for
    BPDUs (BPDU guarding).  It will flag an error (via syslog) if any
    are seen, and disable forwarding on the port, as this typically
    indicates a serious network misconfiguration.  The link will be
    reenabled when link status goes down and then up again, or when
    the administrator manually reenables by removing the link and
    readding it.  (This implementation does not include Cisco's
    "portfast" feature.)

    If the SMF service instance for a bridge is disabled, then bridge
    forwarding stops on those ports as the STP daemon is stopped.  If
    the instance is restarted, STP starts from its initial state.

    The bridge daemon runs as UID/GID "daemon" with PRIV_SYS_DL_CONFIG
    in order to access the raw network devices, but with most other
    basic privileges (e.g., PRIV_PROC_FORK and PRIV_PROC_EXEC)
    removed.  This is set up by the SMF profile for the service.  The
    user does not invoke the daemon directly.

    The existing "Network Management" RBAC profile is sufficient for
    the privileges required to administer bridges using dladm.  No new
    RBAC or Least Privilege changes are required.


4.  VLANs

4.1 VLAN Administration

    In general, administrators will want to have the VLANs they
    configure on the system to be forwarded among all the ports on a
    bridge instance, so this will be the default for VLANs.  When the
    administrator invokes Clearview's "dladm create-vlan", and the
    underlying link is part of a bridge, that command will also enable
    forwarding of the specified VLAN on that bridge link.

    If an administrator wants to configure a VLAN on a link but not
    allow forwarding to or from other links on the bridge, then he
    must take specific action to do so, by disabling forwarding with
    "set-linkprop" -- see the "forward" parameter in section 1.2
    above.

    Clearview UV provides two mechanisms for the creation of VLANs.
    The primary means of configuration is the new "dladm create-vlan"
    subcommand, which automatically enables the VLAN for bridging as
    described above, if the underlying link is configured as part of a
    bridge.

    The second mechanism is a legacy feature called the "PPA hack."
    This allows a user to create a VLAN simply by opening a DLPI
    provider and specifying a VLAN ID number as part of the PPA.  This
    feature has been removed by Crossbow.  However, in the event that
    bridging integrates without Crossbow, will default to disabling
    VLAN forwarding for these by default.  In this case, the user may
    be doing nothing other than snooping on that VLAN, so adding the
    VLAN to the allowed set automatically is likely not the right
    answer.  Administrators with legacy PPA hack VLANs will need to
    reconfigure to use the new Clearview VLANs to take full advantage
    of bridging, and, if Crossbow does not remove them first, this
    issue will be included in the documentation.

    Architecturally, this also means that all VLAN operations for
    bridging (enabling and disabling forwarding paths) can be driven
    by the user space libdladm, and do not need special support from
    the VLAN portions of the kernel dls module.

    In standards-compliant Spanning Tree, VLANs are ignored.  The
    bridging protocol computes just one loop-free topology using
    tag-free BPDU messages and uses this tree to enable and disable
    links.  Administrators are required (by the standard) to configure
    any "duplicate" links they may provision in their networks such
    that when those links are automatically disabled by STP, the
    configured VLANs are not disconnected.  This means very careful
    administrative attention: either run all VLANs everywhere on your
    bridged backbone, or examine all loop-forming links carefully.

    MSTP (not included in this project) is somewhat similar, but
    allows administrators to assign each VLAN to a small number of
    distinct spanning tree "instances," and allows instances within an
    identically-configured "region" to have distinct topologies.  In
    terms of this project, additional bridge and link properties would
    be required to enable MSTP operation.

4.2 VLAN Behavior

    The bridge performs forwarding by examining the allowed set of
    VLANs (as described above) and the default_tag parameter for each
    link.  The steps involved are input VLAN determination, link
    membership check, and then tag update.

    Input VLAN determination begins with a received packet on a link.
    When a packet is received, it is checked for a VLAN tag.  If that
    tag is not present or the tag is priority-only (tag zero), then
    the default_tag configured on that link (if not set to zero) is
    taken as the internal VLAN tag.  If the tag is not present or zero
    and the default_tag is zero, then the packet is ignored; no
    untagged forwarding is performed.  If the tag is present and it's
    equal to the default_tag, then the packet is also ignored; this is
    an error case.  Otherwise, the input tag is taken to be the input
    VLAN.

    Next, the link membership check is performed.  If the input VLAN
    is not configured as an allowed VLAN on this link, then the packet
    is ignored.  Forwarding is then computed, and the same check is
    made for the output link.

    Finally, the tag update is done.  If the VLAN (non-zero at this
    point) is equal to the default_tag on the output link and the
    priority value is zero, then the tag on the packet (if any) is
    removed.  If the priority value is non-zero, then the output tag
    is set to zero.  If the VLAN is not equal to the default_tag on
    the output link, then a tag is added if not currently present, and
    the tag is set for the output packet.

    Note that in the case where forwarding sends to multiple
    interfaces (for broadcast, multicast, and unknown destinations),
    the output link check and tag update must be done independently
    for each output link.


5.  SMF Properties

    These parameters are all Project Private.  They will not be
    documented, and the documented administrative interface will be
    the dladm command.

5.1 STP SMF

    Property Name		Type		Default
    --------------		----		-------
    config/priority		ushort_t	32768
    config/max-age		ushort_t	5120	(20 seconds)
    config/hello-time		ushort_t	512	(2 seconds)
    config/forward-delay	ushort_t	3840	(15 seconds)
    config/force-protocol	int		3

    All of these properties (and their default values and
    granularities) are defined by the STP and related standards.

    The "force-protocol" parameter is specified to allow for an
    upgrade path.  Users who do not want to see the use of MSTP when
    it is implemented can set this parameter to 0 or 2 (as specified
    in IEEE 802.1Q-2004) to select STP or RSTP as the maximum allowed
    protocol.  In this project, the parameter will have no effect, as
    only STP is implemented.

5.2 Datalink Configuration

    Current storage for datalink configuration information is in
    /etc/dladm/datalink.conf, and is manipulated by dladm.  To this
    existing file format, we will add the following keyword:

	bridge=string,

    When we are eventually able to switch over to SMF for link
    configuration (not this project), the parameters will be:

    Property Name		Type		Default
    --------------		----		-------
    config/bridge		string		""

    On a Nemo driver (physical device), legacy device, or aggregation,
    the link parameters are used as above.

    On a VLAN, the "bridge" parameter is reserved for use with MSTP,
    where it will select an instance.

    This parameter is not used on VNICs, as each VNIC is constructed
    atop a VLAN or regular datalink.


6.  Relationship To Other Projects And Futures

6.1 Virtual Switches

    Several other projects, including Crossbow and LDOMs, have
    independently introduced "virtual switches" into Solaris.  Though
    superficially similar, these are not the same thing as 802.1D
    bridges.  The differences include:

    a. A virtual switch cannot forward between physical interfaces on
       a given machine.  It lacks the learning and loop-avoidance
       (Spanning Tree) mechanisms necessary to do that.

    b. A virtual switch doesn't need a forwarding database.  It simply
       looks up the unicast destination among the known clients
       (virtual NICs), and delivers to one of them if a match is found
       or to the single external link, if none is found.

    c. A virtual switch can optimize substantially for the case of
       known local MAC addresses (using multiple receive functions and
       hardware support in the MAC layer), and for driver-specific
       features such as hardware checksum.  A bridge cannot do this,
       as it must listen in promiscuous mode at all times and must be
       able to transmit a packet on any interface regardless of
       hardware support.  (IP would have to understand a per-MAC-
       address capability list rather than per-link in order to use
       hardware features.)

    A useful analogy is that VNICs are the MAC layer equivalent of the
    L3 concept of IP aliases.  They allow the user to create multiple
    MAC address instances on a single datalink, and, if on the same
    subnet, each instance can communicate with the external world
    through one datalink and with the others through internal
    loopback, but that internal communication is not the same thing as
    (or even related to) IP forwarding.

    If one ignores the performance issues noted in (c) above, the
    basic forwarding and learning features of an 802.1D bridge are a
    functional superset of those provided by a "virtual switch,"
    except that a real bridge has no way to add virtual links.

    Again ignoring performance, it may be possible to replace these
    virtual switches with real bridge instances.  The work required
    would include some way to configure "fake" links into a bridge.
    The "etherstub" feature provided by Crossbow may be one simple way
    to do that.

    Future projects may address this area, however the expectation is
    that performance of a bridge configured for this case will be
    below that of a dedicated virtual switch.

6.2 Zones, Xen, and LDoms

    Bridging is a feature similar to link aggregations in terms of its
    position in the network and usage in a data center.  It will not
    be accessible from non-global zones (of any sort), but should be
    accessible from within virtualized environments such as Xen and
    LDoms.

    When Xen or LDoms is used with virtual NICs, a bridge running
    inside the DomU will see a link that doesn't go down when the
    external link goes down.  The configuration is similar to
    interposing a repeater between the actual Ethernet port and the
    internal virtual port, except that because the normal I/O path is
    obscured, the bridging daemon will not see the half-duplex state
    that an actual repeater would produce, and thus will not determine
    stp_p2p state properly.

    Users attempting to run bridges in Xen DomU or under LDoms will
    need to force stp_p2p to "false" instead of "auto."

6.3 RBridges/TRILL

    A future project will introduce RBridge support with TRILL
    encapsulation.  This project will extend the *-bridge features
    proposed here for dladm, and will add new parameters to control
    TRILL and RBridge behavior.

6.4 Forwarding Tables

    It may be useful to be able to display and manipulate the
    MAC-level forwarding tables used within bridging.  This project
    does not define a mechanism to do this, but such a feature is
    likely to be included with RBridges.

    BSD uses protocol "bdg" in netstat to display forwarding tables,
    and it seems reasonable that we should do the same, though that is
    not part of this project.

6.5 "local-mac-address?"

    When set to "false," SPARC platforms will errantly use the same
    MAC address on multiple ports.  Most modern platforms use "true"
    by default, but a few older ones use "false" (and some completely
    obsolete ones can't use "true" at all due to non-IEEE compliant
    hardware limitations).

    This shouldn't be a significant issue for bridging, as we can
    identify multiple local receivers for an inbound frame, but this
    will be included as a "don't do this" in the user documentation.

6.6 Kernel Integration

    This project is affected by the Crossbow changes to Nemo/GLDv3,
    and is planned to integrate after that project.

    The key portion of kernel integration work is with the MAC layer,
    where a bridge must function much like an aggregation, except that
    it does not open the links underneath exclusively and the node on
    top is for observability only.  To implement the latter feature,
    we will make sure that "active" clients get an error when
    attempting to bind, just as is done today to prevent IP from being
    used on individual links within an aggregation.

    In terms of processing order, input packets must be handled by
    802.3ad aggregation first, then by bridging, then VLAN
    segregation, and finally VNIC matching.  Output packets go in the
    reverse order: VNIC matching, VLAN tagging, bridging, and finally
    802.3ad load balancing.

    In terms of functionality, bridging must impose on the hardware
    features reported back to clients (such as IP) so that features
    that are not equivalent among all the links are not used, and it
    must set all of the active links into promiscuous mode.

    Additional details are available in the project design documents.

6.7 Bridging Gaps

    Some things will not be handled as well as they could be with this
    project due to resource constraints.  Notable among these are a
    couple of bridge-related features.

    a. Bridges should be able to preserve CRCs end-to-end and not
       regenerate them during forwarding.  The IEEE specifications
       allow regeneration, but rightly note that it's safer to do
       incremental update (where possible).  Doing this would require
       MAC layer extensions, and might not be possible with all
       network adapters.

    b. When a link is disconnected from a bridge (through the
       administrative commands), it would be useful to force it into a
       link-down state externally, so that the link partner correctly
       detects the event and updates its state promptly.  The Solaris
       MAC layer has no such feature, and adding one would involve
       extensive changes to a large number of drivers.

6.8 L2 Filters

    It would be wise to prevent any L2 Filtering feature from ever
    dropping MAC layer control messages (such as Bridge PDUs) in order
    to avoid known pathological cases in Spanning Tree that result in
    network failure.

    No filter that drops frames addressed to the 01:80:c2:00:00:0x
    range (16 specific multicast addresses) should ever be permitted
    by the system.

6.9 Solaris Audit

    All of the commands issued to the bridging daemon reflect the
    setting of parameters in dlmgmtd's datalink.conf storage or in SMF
    via configd.  As a result, auditing changes is the responsibility
    of those components.  The bridging daemon itself is responsible
    only for protecting the integrity of the door-based interface so
    that processes calling it have the same privileges as those
    modifying the configuration parameters.

    An existing issue here is that dlmgmtd does not audit parameter
    changes into local storage.  This should be the subject of a
    separate project.

    ARC Note: the above assertions may not be complete; the project
    team intends to consult with the Solaris Auditing team to make
    sure that the right events are audited.


7.  Implementation Alternatives

7.1 Using A Separate Command

    An alternative administrative command set design might be to
    create a new bridge control command (bridgeadm), rather than using
    dladm.

    The main problem with this command separation is that the
    configuration of the bridge would end up being split between two
    different utilities in a somewhat incoherent manner.  Why would
    IEEE 802 aggregations be part of dladm but IEEE 802 bridges be
    configured elsewhere?

    Parts of the configuration of a bridge (such as the set of allowed
    VLANs and the default VLAN tag for a given link) are naturally
    part of the link configuration, and not a common property of the
    bridge.  The creation of VLANs (logically located "above" links
    and bridges) and regular Ethernet links (logically located "below"
    VLANs and bridges) via dladm while bridging itself is in bridgeadm
    seems like a very strange result.  It would be more natural only
    if VLAN and VNIC administration were in a separate command as
    well.

    We could still create a separate bridgeadm despite the above
    conceptual problems, but then we'd likely have to deal with the
    VLAN issues (enabling and disabling forwarding for each VLAN) some
    other way.  Most likely, we would end up with either duplicate
    configuration in bridgeadm or the bulk of bridge configuration
    actually going on in dladm per-link properties, and only bridge
    create/destroy done via bridgeadm.

    While there are several IEEE-specified parameters for bridges,
    they're rarely of much interest, so that proposed separate utility
    wouldn't do very much in ordinary use.  The main thing users need
    to manipulate for bridges are the VLANs, currently a dladm object,
    and we need to figure out how to represent that manipulation.  We
    have chosen to equate dladm-created-VLAN with bridge-allowed-VLAN
    because it seems to produce the most natural results: there's only
    one way to "create" or "destroy" a VLAN in the system.

    The alternative is to break those apart, and allow users to create
    VLANs for potential use with IP via dladm (or some other command),
    and separately assign VLANs to bridge ports via bridgeadm, but
    that runs the very likely risk of misconfiguration: either
    forgetting to enable a bridge link for a VLAN while having IP
    plumbed atop, or thinking that destroying the VLAN removes it from
    the bridge.  In any event, it creates multiple steps for the user
    to follow rather than one.  Since neither of those
    misconfiguration scenarios seems to be particularly useful,
    allowing for them doesn't seem like a worthwhile goal.

    Or, for a really short answer: dladm is the location of all things
    datalinkish, and bridging is (like VLANs and aggregations) a
    datalink function.

7.2 Link Configuration Storage

    Alternative designs for the configuration information include
    having the set of links for a bridge listed as part of the bridge
    configuration, and using non-SMF files for storing configuration.

    The former approach (putting a list of links in the bridge) would
    work, and would have the advantage that during start-up of the STP
    daemon it would be easy to find the list of links configured for
    that instance.  That's a benefit over the proposed design in that
    we will need to iterate over all links to get the list needed for
    a single instance.  However, there are two reasons this approach
    wasn't chosen:

	a. A link may be a member of at most one bridge.  This
	   semantic is easy to enforce with a link property, as
	   there's just one instance of the property, but is hard to
	   enforce across multiple bridges.  We end up needing to scan
	   all bridge instances during configuration changes, and
	   configuration transactions become more complex because two
	   objects need to be changed at one time, so locking order
	   matters.

	b. We want to have all configuration parameters for a link
	   stored with the link itself.  Having parameters for a link
	   stored elsewhere in the system means that utilities that
	   manipulate links or just display system configuration may
	   end up needing to scan through these other locations in
	   order to make coherent system changes.  (For this project,
	   we would be forced to change the existing Clearview "dladm
	   delete-link" functionality so that it scanned the bridge
	   instances and removed any links found there.  Storing the
	   data with the link instance removes that requirement.)

    The second approach of using non-SMF files would also work, and we
    could make use of the Clearview UV "link IDs" to avoid problems
    inherent with link renaming.  However, longer term, the Clearview
    and NWAM teams are refactoring link configuration into SMF.
    Having native bridging designed for OpenSolaris but not actually
    integrated with its core administrative mechanisms seems like a
    poor recipe for the future.

    Placing these parameters with the rest of the link parameters
    means that when that day comes, the transition should be simple.

7.3 Obscuring Datalinks

    It would be possible to make bridging's use of links be exclusive
    (much as is done with member links in an aggregation), and force
    IP to use virtual links on the bridge for access rather than using
    the underlying links.

    Doing this would make bridging more like other solutions, but it
    would disable key features.  Users today can open raw DLPI devices
    and talk to a given datalink instance; that would go away when
    bridging is used, and users would be forced to rely on bridging to
    transport packets to the desired interfaces.  In particular,
    there'd be no obvious way to use a non-forwarded (interface local)
    VLAN on any interface.

    Doing this would also mean that VLAN membership would have to be
    configured on those exclusive-use links through some other
    parallel mechanism, resulting in the same sorts of problems as
    documented in 7.1 above.


8.  Interface Summary

    Interface		Stability		Comments
    ---------		---------		--------
    dladm *-bridge	Committed		new subcommands
    field names		Committed		dladm show-bridge -o
    link properties	Committed
    show-link BRIDGE	Committed		new field
    kstats		Volatile		Should be raised later
    /dev/bridge/	Committed		Observability node
    control ioctls	Project Private
    /usr/lib/bridged	Project Private
    svc:/network/bridge	Committed		SMF URI
    config/*		Project Private		SMF properties
    bridge module	Project Private		Kernel bridging module
    /var/run/bridge_door/
			Project Private		Doors interface to daemons
    librstp.so.1	Project Private		RSTP implementation
    mac, dls, dld	Consolidation Private	Kernel APIs