Abstract ======== This case seeks to expand on PSARC/2005/334, PSARC/2006/321 and PSARC/2007/666, evolving the APIs and presenting some of them as a committed interface. The key requirement that came out of this evolution is providing the means for multiple consumers of events to indicate that they are interested in receiving them. In addition, there is demand from the community for an interface that they are able to program to. Release Binding --------------- This case seeks to obtain approval for patch binding. Background ========== The first project to deliver packet filtering hooks into the mainline of IP processing, PSARC/2005/334, did so with an understanding that it would be limited to allowing a single consumer to process packets that it receives on a hook where the packet contents are allowed to be modified by definition of the hook. Since the completion of PSARC/2005/334 there has been widespread interest from various communities around both Solaris and OpenSolaris in seeing the API evolved further. Not long after the completion of this project, PSARC/2006/366 (IP instances) delivered into IP, providing the capability to define a local zone as having its own IP stack (routing table, TCP connections, etc.) As a part of this project, the hooks from PSARC/2005/334 were made local to each IP instance, so that a zone with a private instance of IP could choose whether or not to run a firewall, independant of the global zone and also with its own security policy. The result is that hooks now need some understanding of the instance of IP in which they are being used for them to be used with full meaning. Introduction ============ Boundaries ---------- This case is confined to dealing with the API that is exported via the netinfo (neti) module in the kernel. While this case will make it possible for consumers of the API to be aware of the different instances of IP that are active in the kernel, this project does not propose any sort of data management related to those instances: individual consumers of this API are responsible for managing their own instance data. Goals ----- This case seeks to accomplish the following major tasks: * provide an interface that allows consumsers to be aware of multiple IP instances; * provide an interface that allows multiple consumers of events to be present; * to provide a programatic method to specify the ordering of hooks, either relative to each other or as being first/last; * provide data management functions for objects used in relation to the APIs being introduced with this case; * document the observability being introduced through kstats. The new interfaces being introduced with this case are presented in a separate section below to that for the old interfaces being updated. More information (draft man pages) can be found in the case directory. Out of scope ------------ This case is concerned solely with the programming aspects behind using this API, not its management (through outside control). Thus the following is considered out of scope: * over-riding or otherwise managing the hook ordering hints that are (optionally) used by programmers. Interface changes from PSARC/2005/334 ===================================== This section walks through the changes to the previously introduced interfaces at a high level. See below for more technical detail on the changes. Naming changes. --------------- In reviewing the interfaces used in PSARC/2005/334, it became evident that the naming scheme used had not been well thought through for future work. The new naming style being pursued by this problem is, roughly speaking, net__(). There are also a few changes in the arguments to the functions. +-----------------------+-------------------------+ | PSARC/2005/334 | PSARC/2008/219 | |-----------------------+-------------------------+ | net_register | net_protocol_register | | net_unregister | net_protocol_unregister | | net_lookup | net_protocol_lookup | | net_release | net_protocol_release | | net_register_hook | net_hook_register | | net_unregister_hook | net_hook_unregister | | net_register_event | net_event_register | | net_unregister_event | net_event_unregister | | net_register_family | net_family_register | | net_unregister_family | net_family_unregister | | net_info_t | net_protocol_t | +-----------------------+-------------------------+ Table: Interface name changes +------------------------------+--------------------------------+ | PSARC/2005/334 | PSARC/2008/219 | |------------------------------+--------------------------------| | net_lookup(const char *) | net_protocol_lookup(netid_t, | | | const char *) | | net_walk(nat_data_t) | net_protocol_walk(netid_t, | | | net_handle_t) | | net_register(net_info_t *) | net_protocol_register(netid_t, | | | const net_protocol_t *) | | | net_routeto(net_data_t, | net_routeto(net_handle_t, | | struct sockaddr *) | struct sockaddr *, | | | struct sockaddr *) | +------------------------------+--------------------------------+ Table: Interface changes from PSARC/2005/334 Data Structure changes ---------------------- This case promotes the use of structures involved with this API as being managed by this API, through the use of alloc/free functions. hook_t ~~~~~~ The use of this structure is now managed through hook_alloc() and hook_free(). Additions to this structure since PSARC/2005/334 include: * an ordering hint for the insertion of the hook on an event [h_hint]; * qualification data for the hint (such as a name) [h_hintvalue] and * an arbitrary argument to be passed back into the function called when the hook is activated by an event [h_arg]. net_inject_t ~~~~~~~~~~~~ This structure has been updated to include a version field, that is managed by this interface, with the change to using alloc/free functions. net_protocol_t ~~~~~~~~~~~~~~ This structure has been renamed from net_info_t in PSARC/2005/334 to a new name that better represents its purpose: to carry information through from a network protocol to the netinfo module. Accompanying the name change is an updating of all the field names for this structure. At present there is neither desire nor need to make it possible for code outside of this consolidation to register protocols, thus it the structure itself remains a private interface. hook_pkt_event_t ~~~~~~~~~~~~~~~~ An extra field has been added to the hook_pkt_event_t structure. The field, hpe_family, is provided to allow the function being called to use this (a net_handle_t) value to discover the instance (and thereby zone) to which the event belongs. hook_nic_event_t ~~~~~~~~~~~~~~~~ As with hook_pkt_event_t, an extra field, hne_family, has been added to provide more context to the receiver of the event. New Interfaces ============== This case seeks to introduce some new interfaces, in addition to updating previously introduced interfaces. IP instance event notification ------------------------------ To provide the ability for consumers of this interface to become aware of the addition or removal of new IP stack instances to the live system, it is necessary to provide the consumer with the means to register a callback that is activated with related events. The means through which the callback is registered is via an allocated net_instance_t structure. This structure gives the consumer the ability to become informed of create, destroy and shutdown events. In registering callbacks, both the create and destroy must be supplied - a function to handle the shutdown callback is optional. See the interface table below for the respective commitment levels being sought. +----------------------------+-------------+ | Interface | Stability | +----------------------------+-------------+ | net_instance_alloc | Committed | | net_instance_free | Committed | | net_instance_register | Committed | | net_instance_unregister | Committed | | net_instance_t | Committed | +----------------------------+-------------+ Table: net_instance stability Netinfo change notificactions ----------------------------- While the above callbacks provide notification of instances of IP as they arrive or schedule departure, there are two other sets of events that can be advantageous to become aware of - arrivial of events attached to a protocol and the callbacks registered on those events. The restrictions on the callbacks are minor: - they must not sleeping waiting for IO or user space; - they must not call net_*_notify_register or net_*_notify_unregister. +--------------------------------+----------------------------------+ | Target to monitor | Events received describe | +--------------------------------+----------------------------------+ | IP instance management | | | (net_instance_register | Addition/removal of IP instances | | net_instance_unregister) | | +--------------------------------+----------------------------------+ | Protocol event management | | | (net_protocol_notify_register | available for a protocol | | net_protocol_notify_unregister)| Addition/removal of events | +--------------------------------+----------------------------------+ | Hook callback management | | | (net_event_notify_register | Addition/removal of hooks to be | | net_event_notify_unregister) | called for protocol events | +--------------------------------+----------------------------------+ Table: API infrastructure change notifications +--------------------------------+-------------+ | Interface | Stability | +--------------------------------+-------------+ | net_event_notify_register | Committed | | net_event_notify_unregister | Committed | | net_protocol_notify_register | Committed | | net_protocol_notify_unregister | Committed | +--------------------------------+-------------+ Table: netinfo change notifications kstats ------ It is reasonable to expect that consumers of this interface may wish to publish information via kstats and thus may need to be able to provide different sets of data through kstats for each instance of the IP stack. Two new functions are introduced to create and destroy per instance kstat data. The returned pointer from net_kstat_create can be used with other kstat functions such as kstat_create. NOTE: The value returned from net_kstat_create must NOT be passed into kstat_delete and nor is the value returned from kstat_create allowed to be passed into net_kstat_delete. kstat_t *net_kstat_create(netid_t, char *, int, char *, char *, uchar_t, ulong_t, uchar_t); void net_kstat_delete(net_handle_t, kstat_t *); +----------------------------+-------------+ | Interface | Stability | +----------------------------+-------------+ | net_kstat_create | Committed | | net_kstat_delete | Committed | +----------------------------+-------------+ Table: net kstat commitment Mapping instances to zones --------------------------- To map the instance of IP in which a hook is being executed to z zone and back again, two functions are supplied that convert zonid_t's to netid_t's and vice-versa. A zone that has an exclusive network stack instance will return a unique netid_t value for its given zoneid_t. The packet and network interface events that are provided by the netinfo framework come with a reference to the relevant protocol family by way of a net_handle_t field. This can be mapped into an identifier that represents the instance of IP by using net_getnetid(). extern netid_t net_zoneidtonetid(zoneid_t); extern zoneid_t net_getzoneidbynetid(netid_t); extern netid_t net_getnetid(net_handle_t); +-------------------------+------------+ | Interface | Commitment | +-------------------------+------------+ | net_getnetid | Committed | | net_getnetidbyzoneid | Committed | | net_getzoneidbynetid | Committed | +-------------------------+------------+ Table: Mapping netid_t/zoneid_t commitment Detailed Interface Specification For New Interfaces =================================================== Netinfo callbacks ----------------- The netinfo callback interface is provided to allow a consumer to become aware of when instances are created or destroyed. The definition of the structure can be found in section A.1. The fields are expected to be used as follows: * nin_version - used by the net_instance_*() functions and must not be modified by consumers; * nin_create - create function, must be set by consumer; * nin_destroy - destroy function, must be set by consumer; * nin_shutdown - shutdown function, must be set by consumer. The create function in the set of callbacks is called as a part of the process that creates a new instance of IP - before any traffic will appear for that instance. The only argument to the create function is an identifier that uniquely identifies this instance from all others. The return value from the create is passed back in as the 2nd argument to the destroy and shutdown functions. The destroy callback is called during the process of removing the owning instance of IP from the system. Hooks registered using the interface herein are expected to be unregistered through either the shutdown or destroy callback. The hook interface ------------------ The hook interface is provided as the means by which callbacks are added to an event that is provided by an event family. The structure to hold the hook information should be allocated by a call to hook_alloc() and when the owner is ready to free it, hook_free() should be called. The use of the data structure members is as follows: * h_version - initialised by hook_alloc() - must not be modified by consumer [PSARC/2005/334]; * h_func - function that the event should call [PSARC/2005/334]; * h_name - a text string representing the name given to this hook or owner of the hook [PSARC/2005/334]; * h_hint - hints about how to insert the hook on the event (see below for more details) [PSARC/2008/219]; * h_hintvalue - see the details below on hints for more information on how this field is to be used [PSARC/2008/219]; * h_arg - the value of h_arg is passed back into h_func as the 3rd argument to the callback function [PSARC/2008/219]. Hook hints ~~~~~~~~~~ A major problem with PSARC/2005/334 was that it limited each event to a single hook. This case proposes to remedy this limitation by allowing each hook to optionally specify a single *hint* about how it is placed on the list of hooks to call when an event is activated. There are 5 possible hints to choose from: * none (there are no special ordering constraints) * first (place the hook first) * last (place the hook last) * before "X" (place this hook before a hook named "X") * after "X" (place this hook after a hook named "X") A hook is limited to specifying only *1* hint for itself. For both of the hints specifying a hook should either be last (HH_LAST) or first (HH_FIRST), the h_hintvalue field in the hook structure should be 0. For both of these hints, only one hook may registered to an event with this hint. For the hints that specify before (HH_BEFORE) or after (HH_AFTER), the value of h_hintvalue should represent a pointer to a string for the name name of the other hook upon which the dependency will be asserted. The use of HH_AFTER with the name of a hook that has used HH_LAST will not succeed and likewise, using HH_BEFORE with the name of a hook that has specified the hint HH_FIRST will not succeed. The name supplied with HH_BEFORE/HH_AFTER may represent the name of a hook that is not currently present on the event, in which case, the hook is inserted on the event in a manner that will satisfy other hints present but is otherwise not deterministic. Example 1. If hook A is registered for event E first, and asks to be placed first on the list, then this will be done. If a later hook, B, is registered for event E, it may either ask to be placed before A or to be placed in the first position, but can only succeed in being placed after A. Adding hook A to event E: [E]--->[A(first)]--->| Adding hook B with the hint to be before A: [E]--->[A(first)]--->[B(after_A)]--->| The definition of the hint can be found in appendix A.2.2. Example 2. If hook A is registered for event E first (but without any hints), it is placed on the event hook list: [E]--->[A]--->| If I then add hook B and ask for it to be before A, the list of hooks becomes: [E]--->[B(before A)]--->[A]--->| If I follow this up with another hook C that wants to be before A, the end result can be either of the two following scenarios: [E]--->[C(before A)]--->[B(before A)]--->[A]--->| [E]--->[B(before A)]--->[C(before A)]--->[A]--->| IPFilter hook naming -------------------- For applications that wish to insert hooks before or after IPFilter in the packet stack, the names used by IPFilter are provided as an uncommitted interface: +------------------+--------------------------+----------------+ | Packet Hook | IPFilter Hook Name | Classification | +------------------+--------------------------+----------------+ | NH_PHYSICAL_IN | "ipfilter_hook_in" | Uncommitted | | NH_PHYSICAL_OUT | "ipfilter_hook_out" | Uncommitted | | NH_LOOPBACK_IN | "ipfilter_hook_loop_in" | Uncommitted | | NH_LOOPBACK_OUT | "ipfilter_hook_loop_out" | Uncommitted | +------------------+--------------------------+----------------+ kstats ====== To aid in diagnosing problems and system activity concerning the hooks, information is provided through the kstats interface concerning both events and the hooks registered to each event. When a hook event is registered with this framework, an entry is created in kstats that is associated with the relevant instance of IP. Similarly, whenever a hook is registered with a callback on an event, a kstat entry is automatically added for that too. When either hooks or hook events are removed, the respective entry in kstats is also removed. kstat naming ------------ Each hook event is represented in kstats as follows: module - hook family name ("inet", "inet6", etc) name - name of event ("PHYSICAL_IN", "PHYSICAL_OUT", etc) class - "hook_event" Three counters are published with each kstat in this group: hooksAdded - number of hooks registered with the event hooksRemoved - number of registered hooks removed events - count of the number of events executed Each hook registers a kstat node named as follows: module - family_name/event_name (ie. "inet/PHYSICAL_IN") name - hook name (ie. "ipfilter_hook_in") class - "hook" Six fields are published for each registered hook via kstats; version - value passed in to hook_alloc() flags - flags field from hook_t hint - ordering hint value hint_value - pointer associated with 'hint' (for HH_AFTER/BEFORE, the name is displayed) position - counter, starting at 1, reflecting the position of the hook for the event hook_hits - count of the number of times the callback is called To use the recorded kstats, it is possible to generate queries like this: ...to see all of the kstats for all hooks registered to all events: $ kstat -c hook ...to see all of the events registered to IPv6: $ kstat -m inet6 -c hook_event ...to see which events IPFilter has registered an inbound hook for: $ kstat -n ipfilter_hook_in -c hook Stability --------- The information and the provision of information via kstats is uncommited. Interfaces ========== +-------------------------------------------------+ | Interfaces Exported | +--------------------------------+----------------+ | Interface | Classification | +--------------------------------+----------------+ | hook_t | Committed | | hook_alloc | Committed | | hook_free | Committed | | hook_func_t | Committed | | hook_nic_event_t | Committed | | hook_pkt_event_t | Committed | | HOOK_VERSION | Committed | | GLOBAL_NETID | Committed | +--------------------------------+----------------+ | netid_t | Committed | | net_instance_alloc | Committed | | net_instance_free | Committed | | net_instance_register | Committed | | net_instance_unregister | Committed | | net_instance_t | Committed | +--------------------------------+----------------+ | net_event_register | Private | | net_event_unregister | Private | | net_event_notify_register | Committed | | net_event_notify_unregister | Committed | | net_family_register | Private | | net_family_unregister | Private | +--------------------------------+----------------+ | net_getifname | Committed | | net_getmtu | Committed | | net_getnetid | Committed | | net_getpmtuenabled | Committed | | net_getlifaddr | Committed | | net_getzoneidbynetid | Committed | +--------------------------------+----------------+ | net_hook_register | Committed | | net_hook_unregister | Committed | | net_inject | Committed | | net_inject_alloc | Committed | | net_inject_free | Committed | | net_inject_t | Committed | +--------------------------------+----------------+ | net_ispartialchecksum | Committed | | net_isvalidchecksum | Committed | | net_kstat_create | Committed | | net_kstat_delete | Committed | | net_lifgetnext | Committed | | net_protocol_notify_register | Committed | | net_protocol_notify_unregister | Committed | | net_phygetnext | Committed | | net_phylookup | Committed | +--------------------------------+----------------+ | net_protocol_lookup | Committed | | net_protocol_notify_register | Committed | | net_protocol_notify_unregister | Committed | | net_protocol_register | Private | | net_protocol_release | Committed | | net_protocol_unregister | Private | | net_protocol_walk | Private | | net_routeto | Committed | | net_zoneidtonetid | Committed | +-------------------------------+----------------+ | NETINFO_VERSION | Committed | | NHF_ARP | Committed | | NHF_INET | Committed | | NHF_INET6 | Committed | | nic_event_t | Committed | | | Committed | | | Committed | | | Committed | +-------------------------------+----------------+ | "ipfilter_hook_in" | Uncommitted | | "ipfilter_hook_out" | Uncommitted | | "ipfilter_hook_loop_in" | Uncommitted | | "ipfilter_hook_loop_out" | Uncommitted | +--------------------------------+----------------+ Table: Exported interfaces stability Appendix A - Data structures ============================ A.1 - net_instance_t -------------------- typedef net_instance_s { int nin_version; char *nin_name; void *(*nin_create)(const netid_t); void (*nin_destroy)(const netid_t, void *); void (*nin_shutdown)(const netid_t, void *); } net_instance_t; A.2 - hook_t ------------ typedef struct hook { int h_version; hook_func_t h_func; char *h_name; hook_hint_t h_hint; uintptr_t h_hintvalue; void *h_arg; } hook_t; A.2.1 - hook_func_t ------------------- typedef int (* hook_func_t)(hook_event_token_t, hook_data_t, void *); A.2.2 - hook_hint_t ------------------- typedef enum hook_hint { HH_NONE = 0, HH_FIRST, HH_LAST, HH_BEFORE, HH_AFTER, } hook_hint_t; A.3 - net_inject_t ------------------ typedef struct net_inject { int ni_version; mblk_t *ni_packet; struct sockaddr_storage ni_addr; phy_if_t ni_physical; } net_inject_t; A.4 - hook_pkt_event_t ---------------------- typedef struct hook_pkt_event { net_handle_t hpe_protocol; phy_if_t hpe_ifp; phy_if_t hpe_ofp; void *hpe_hdr; mblk_t **hpe_mp; mblk_t *hpe_mb; int hpe_flags; void *hpe_reserved[2]; } hook_pkt_event_t; A.5 - hook_nic_event_t ---------------------- typedef struct hook_nic_event { net_handle_t hne_protocol; phy_if_t hne_nic; lif_if_t hne_lif; nic_event_t hne_event; nic_event_data_t hne_data; size_t hne_datalen; } hook_nic_event_t; A.5.1 - nic_event_t ------------------- typedef enum nic_event { NE_PLUMB = 1, NE_UNPLUMB, NE_UP, NE_DOWN, NE_ADDRESS_CHANGE } nic_event_t; B.5 - functions exported ------------------------ hook_t * hook_alloc(const int version) void hook_free(hook_t *) | typedef int (* hook_notify_fn_t)(hook_notify_cmd_t, void *, const char *, const char *, const char *); int | net_event_notify_register(net_handle_t family, char *event, hook_notify_fn_t callback, void *arg); int | net_event_notify_unregister(net_handle_t family, char *event, hook_notify_fn_t callback); net_instance_t * net_instance_alloc(const int version); void net_instance_free(net_instance_t *); | | int | net_instance_notify_register(net_handle_t, hook_notify_fn_t, void *) | | int | net_instance_notify_unregister(net_handle_t, hook_notify_fn_t) int net_instance_register(net_instance_t *); void net_instance_unregister(net_instance_t *); kstat_t * net_kstat_create(netid_t, char *, int, char *, char *, uchar_t, ulong_t, uchar_t); void | net_kstat_delete(netid_t, kstat_t *); net_inject_t * net_inject_alloc(const int); void net_inject_free(net_inject_t *); net_handle_t net_protocol_lookup(netid_t, const char *); int net_protocol_release(net_handle_t); int net_hook_register(net_handle_t, char *, hook_t *); int net_hook_unregister(net_handle_t, char *, hook_t *); int net_getifname(net_handle_t, phy_if_t, char *, const size_t); int net_getmtu(net_handle_t, phy_if_t, lif_if_t); typedef id_t netid_t; netid_t net_getnetid(net_handle_t) int net_getpmtuenabled(net_handle_t); int net_getlifaddr(net_handle_t, phy_if_t, lif_if_t, int, net_ifaddr_t [], void *); lif_if_t net_lifgetnext(net_handle_t, phy_if_t, lif_if_t); int net_inject(net_handle_t, inject_t, net_inject_t *); phy_if_t net_phygetnext(net_handle_t, phy_if_t); phy_if_t net_phylookup(net_handle_t, const char *); int | net_protocol_notify_register(net_handle_t family, hook_notify_fn_t callback, void *arg); int | net_protocol_notify_unregister(net_handle_t family, hook_notify_fn_t) phy_if_t | net_routeto(net_handle_t, struct sockaddr *, struct sockaddr *); int net_ispartialchecksum(net_handle_t, mblk_t *); int net_isvalidchecksum(net_handle_t, mblk_t *); Appendix B ========== The following table lists the netinfo functions that implement functionality that is also provided by socket ioctls. Socket ioctl netinfo function ----------------- ---------------- SIOCGLIFADDR net_getlifaddr() SIOCGLIFDSTADDR net_getlifaddr() SIOCGLIFBRDADDR net_getlifaddr() SIOCGLIFNETMASK net_getlifaddr() SIOCGLIFMTU net_getmtu() Appendix C ========== For illustrative purposes, source code has been included with this case and can be found in the case directory. The supplied sample file is a working example.