8. SMF Network Services

Version 0.7, 2008-Apr-29

Network configuration is currently scattered across several SMF services:

The last two in particular are not well-defined, and are essentially catch-alls for random bits of network configuration.

In NWAM Phase 1, a large portion of the network/service service will be eliminated, and the behavior of the network/physical legacy (:default) and NWAM (:nwam) instances will be clarified.

8.1 SMF Service Model

The NWAM Architecture discussed the elimination of all four of the SMF services listed above. After some implementation experience, and with delivery targets in mind, we have narrowed down the scope for Phase 1 to only eliminate network/service. (The external architecture document linked above is meant to be informative, not normative for this project.)

8.1.1 svc:/network/loopback

Elimination of this service will be considered when all network configuration is unified under SMF; at that time, it is likely the loopback interfaces will be represented as instances of the IP service, in the same way as other IP interfaces.

8.1.2 svc:/network/physical

Implementation experience from NWAM Phase 0 indicates that keeping this service, with separate instances for NWAM vs. Legacy mode, is a reasonable design. Doing so prevents disruption to the (many) services that depend on network/physical; switching between legacy and NWAM does not affect those dependencies, either, as the dependencies are generally on the service itself, not a specific instance; having either instance online thus satisfies the dependency.

8.1.3 svc:/network/initial

Elimination of this service is not directly related to the NWAM phase 1 work; and in fact, other projects are currently working on removing at least parts of the network/initial functionality. A future phase of NWAM may address completely cleaning up this service, but NWAM phase 1 will not change it.

8.1.4 svc:/network/service

Completely eliminating this service would be difficult, as many other services have dependencies on it. This work will be postponed until a more comprehensive re-work of the network-related SMF services can be performed. However, this service will be modified in NWAM Phase 1.

It currently performs three tasks:

  1. Resetting of the netmask and broadcast address, in case new information is available from NIS.
  2. Updating /etc/resolv.conf and /etc/nsswitch.conf with DNS info received from a DHCP server.
  3. Loading IPQoS configuration information.
Tasks one and three will be split off into separate services, network/netmask and network/ipqos. Modularity of this sort allows more appropriate dependencies to be established, and provides more flexibility for different network policy engines to manage service configuration. It also eases the transition to more dynamic network configuration, as it makes it possible to restart/refresh specific services as needed, without having to worry about side effects caused by a miscellaneous assortment of tasks being jumbled together. The new services will each have a subset of network/service's current set of dependencies; network/netmask will depend on the nis services and network/initial, while network/ipqos will depend on network/initial and the filesystem services.

The service will continue to perform task two, but only for the non-NWAM case (i.e. network/physical:default is enabled, rather than network/physical:nwam). When NWAM is enabled, the nwam service will handle task two, as described in section 8.1.4.1. This distinction is necessary to avoid conflicts when NWAM is controlling network configuration; conflicts can occur because NWAM allows the network configuration to change dynamically, and thus potentially receive new DHCP leases, from different servers. This then requires that the DNS information stored in resolv.conf and nsswitch.conf be updated more often than during the boot sequence, which is typically the only time when network/service would be enabled.

The distinction between the non-NWAM and NWAM cases will be managed with a simple check at the beginning of the network/service start method, which will return success immediately if network/physical:nwam is online.

8.1.4.1 Updating /etc files with DNS Information

As explained above, given the dynamic configuration made possible by NWAM, updates to /etc/resolv.conf and /etc/nsswitch.conf need to happen more dynamically as well. The code to perform those updates currently exists in a script that 1) is generally executed only during boot and 2) performs other work that isn't necessary or even desirable at the same time the file updates are needed; obviously, some changes need to be made.

Several approaches have been considered:

  1. The NWAM Phase 0 solution: nwamd simply re-execs the start method for network/service, which checks to see if it has been invoked by nwamd and exits before doing the additional, unrelated work in that case.

    This works, but is not a clean solution to the problem.

  2. Move the update functionality into dhcpagent(1M), or into a library that can be called by dhcpagent, when it obtains a lease.

    Though simple, this is not architecturally appropriate; dhcpagent should not be in the business of deciding what configuration it receives should be applied, or where it should be used.

  3. Extend the existing scripting mechanisms provided by dhcpagent(1M): services could provide scripts that would be invoked by dhcpagent on state changes; thus NWAM could provide a script that performed the appropriate updates when a lease was obtained.

  4. Provide a mechanism for dhcpagent(1M) to signal other processes when specific events occur; nwamd could then receive those signals and do the appropriate file updates when the lease was obtained. This would also provide a more reliable notification of lease acquisition than the RTM_NEWADDR messages currently being used in NWAM Phase 0.

Both options 3 and 4 provide a good solution which could be generalized to solve other problems. For example, dhcpagent currently creates a default route when it receives a router address; this functionality could be moved to a service which provides routing policy management. We will go with option 4, as it gives the most flexibility to applications; it also has the benefit of providing more reliable detection of lease acquisition than the method currently used by nwamd.

The signaling mechanism provided by dhcpagent will be designed, arc'd and implemented separately from the NWAM Phase 1 project, but will be a requirement for NWAM Phase 1 integration, and will be resourced by the NWAM team. This design will assume the existence of a "Lease Acquired" event which it will receive from dhcpagent.

8.2 Refresh/restart behavior of network/physical:nwam

Since network/physical:nwam manages all network configuration, stopping this service has the effect of shutting down the system's network capabilities. This is accomplished by tearing down all interfaces and links, and changing to the "No-network" Location. This behavior would be drastic, though, in the case where the service went into the maintenance state, and was then returned to the online state after the error condition was corrected. It would also be heavy-handed in the case of a service refresh.

Thus there are two approaches to service restart that the NWAM service should take, depending on the circumstances of the restart. It could do a "hard reset," in which all network configuration is torn down and created from scratch, based on the active NCP; or it could do a "soft reset," where it only makes the changes needed to restore the active NCP.

The following table defines the type of reset desired under different service conditions.

Service State Action Behavior
offline start hard reset
online stop hard reset (tear down only)
online refresh soft reset
maintenance clear soft reset

This behavior will be accomplished by performing specific actions on normal start and stop of the NWAM service.

On startup, nwamd will first determine if it is doing a hard or soft reset, depending on the presence of /var/run/nwam_soft_reset. For a soft reset it will simply activate the current NCP, making whatever changes are needed to the link/interface configuration to make it match the NCP. For a hard reset, it will perform the following steps:

  1. Create the soft reset marker, /var/run/nwam_soft_reset
  2. Save the Legacy Location, based on current settings
  3. Tear down any configured links or interfaces
  4. Activate the No-Net Location
  5. Activate the current NCP
On normal shutdown, nwamd will perform the following steps:
  1. Remove the soft reset marker
  2. Tear down the current NCP
  3. Activate the No-Net Location
  4. Put the Legacy Location settings in place without activating them
  5. Delete the Legacy Location
Thus on normal shutdown, NWAM will force a hard reset the next time it is started by removing the soft reset marker; but if the daemon exits prematurely and is restarted, a soft reset will be performed. The placement of the marker in /var/run also ensures that the first time NWAM starts after a system reboot, it will perform a hard reset, as the marker file will be deleted on system boot.

The function and use of the Legacy Location is discussed in more detail in section 8.3.

A service property will be provided to allow the administrator to request that nwamd do a soft shutdown, resulting in no disruption of network configuration and no removal of the marker file.

8.3 Interaction of Legacy and NWAM Modes

With NWAM Phase 1, there will be two separate repositories for network configuration data: the legacy repository and the NWAM repository. The legacy repository is made up of /etc/hostname.<intf> and /etc/dladm.conf files, various other configuration files associated with network services, and the enabled/disabled state and properties of network-related SMF services; while the NWAM repository is made up of the files under /etc/nwam.

The components of the legacy repository are consumed by different SMF services; some by svc:/network/physical:default, others by one of the other svc:/network services discussed in section 8.1, and still others by their own associated SMF service. The legacy repository components consumed by network/physical:default are not problematic; when NWAM is enabled, that service instance is disabled, so the configuration data it consumes will be ignored. Legacy configuration that is consumed by other services is more problematic, however; clear rules need to be established to define how the settings are handled under the different modes.

8.3.1 Link and Interface Configuration Data

For the most part, the distinction between legacy mode link/interface configuration and NWAM NCP configuration is easily separable; most all of the legacy mode work is done in the network/physical:default start method, so when that instance is disabled, that configuration is not performed. However, tunnel configuration is explicitly omitted from network/physical:default; it is currently performed in the start method of network/initial service, and will be broken out into a new network/iptun service with the integration of the IP Tunneling Device Driver component of the Clearview project.

Since the new network/iptun service will be in place prior to the integration of NWAM Phase 1, NWAM will simply change the dependency of this service; instead of depending on network/physical, it will depend explicitly on network/physical:default, ensuring that it will only be enabled in legacy mode. Thus NWAM will be able to create/manage IP tunnels according to its NCP.

8.3.2 Location-Related Configuration Data

The distinction between the legacy and NWAM repositories becomes less clear for Location configuration. While NWAM has a distinct repository that stores Location settings, a Location is activated by placing that data into the standard file/service property used by the legacy repository. It will thus be necessary for NWAM to keep track of the legacy Location settings, which can be restored when switching to legacy mode.

8.3.2.1 Location Activation

Activation of a location consists of two distinct phases:

When it activates a location, nwamd will explicitly perform the installation step; SMF service dependencies will be used to perform instantiation.

As installation of a location involves writing to the filesystem, it cannot be performed as early as network/physical. Thus a new service, svc:/network/location will be introduced. This service will depend on svc:/system/filesystem/usr; and a dependency on svc:/network/location will be added to each location-related service. The new service will also depend on network/physical.

During boot, these dependencies will be sufficient to ensure that things happen in the correct order: the network elements will be configured by network/physical, the location data will be installed by network/location (installing the legacy location if network/physical:default is enabled, the no-net location if network/physical:nwam is enabled), and then the location-related services will come online with the appropriate configuration in place.

This sequence must also happen if NWAM is disabled and legacy mode enabled, or vice versa; and when NWAM is running and chooses to change location. This will be accomplished by giving the dependencies an appropriate restart_on attribute:

Thus if either the nwam or default instance of network/physical is disabled, network/location will be restarted, as will the location-related services. When NWAM is running and changes location, it will do so by performing the installation step, and then refreshing the network/location service, which will cause the location-related services to be restarted, thus instantiating the location.

8.3.2.2 Transition from NWAM mode to Legacy mode

When NWAM is disabled, it will perform a hard reset shutdown; that is, all links and interfaces will be torn down, and the No-Net Location will be activated. Legacy mode can then be turned on by enabling network/physical:default; when network/location restarts (due to its restart_on restart dependency on network/physical), it will install the legacy location, which will then be instantiated as the location-related services are restarted due to their restart_on refresh dependencies on network/location.

Note that the record of the legacy location must be deleted after it has been installed; subsequent starts of the network/location service (as on reboot) should not change the location configuration if the system remains in legacy mode.

8.3.2.3 Transition from Legacy Mode to NWAM Mode

In order to minimize the changes to legacy mode behavior, no changes will be made to network/physical:default's existing stop method. Therefore, the network configuration will not be modified at all when that service is stopped. When NWAM is enabled, it will perform hard reset behavior (assuming that either NWAM has not been enabled since boot, or that it was stopped prior to legacy mode being enabled); this will result in the creation of the Legacy Location based on the current settings, the installation of the No-Net Location, and then a refresh of the network/location service to instantiate the No-Net Location. As the active NCP is instantiated, the Location will be adjusted as needed, according to the Location selection logic; changes to the Location will result in additional refreshes of the network/location service.

8.4 Mutual-exclusivity of Legacy and NWAM Instances

Section 8.3 clearly assumes that Legacy and NWAM modes are mutually exclusive; that is, there will not be a case where both are enabled. The obvious way to enforce that exclusivity would be to create EXCLUDE dependencies, where each service instance depends on the other being disabled. Unfortunately, exclude dependencies only work in one direction; that is, one service can have an exclude dependency on another, but they cannot be mutually exclusive. If service A and B each have exclude dependencies on each other, and service A is online, enabling service B results in both services going offline.

The next best alternative would be for each service to put itself into maintenance state if enabled while the other is already enabled. Going into maintenance state should be a very clear red flag to the administrator that something is wrong; and messaging explaining the problem and resolution will be easily obtained via the standard SMF diagnostic tools (error message in the service log file, information in the svcs -xv output). Appropriate changes will be made to each service instance's start method to implement this.

Revision History

Revision Date Changes
0.1 2007-Sep-05 initial draft with topic outline
0.2 2008-Mar-04 we're not using SMF as the repository; first cut at details
0.3 2008-Mar-06 added rationale for not using exclude_all dependencies
0.4 2008-Mar-14 added additional notes on restart_on attribute question (§8.3.2.1)
0.5 2008-Apr-08 complete §8.3.1 question
0.6 2008-Apr-17 complete §8.1.4 question
0.7 2008-Apr-29 update §8.3.2 with more details on location activation