IP Multipathing Library Interface Peter Memishian (meem@eng.sun.com) Introduction ============ This document defines a set of interfaces for querying and administering the IPMP subsystem. To house these routines, a "libipmp" library is also defined, along with several header files, specified below. The query interfaces are designed to either be used either on their own, or in concert with the events described in "IPMP Asynchronous Event Definitions" (described separately), depending on the needs of the application. In the latter case, it's expected that an application will initially use the query interfaces to retrieve the state of the IPMP subsystem, and then subsequently rely on events to update its state. In addition, if lost events are detected, the application can use the query interfaces to resynchronize. These events are being defined for consumption for the IPMP test suite, inside ON consolidation, and by contract with other consolidations (e.g., Sun Cluster). Library Interfaces ================== The library interfaces are detailed in the following five sections. Error Handling -------------- Rather than overload the oft-abused errno space, libipmp defines its own set of error codes. For consistency, with the exception of ipmp_errmsg(), all interfaces return error codes (rather than, say, pointers to allocated structures), unless the interface cannot fail, in which case it has no return value. The set of defined errors are: enum { IPMP_SUCCESS, /* successful operation */ IPMP_FAILURE, /* generic failure (check errno) */ IPMP_EMINRED, /* minimum failover redundancy not met */ IPMP_EFBDISABLED, /* failback disabled */ IPMP_EUNKADDR, /* unknown IPMP data address */ IPMP_EINVAL, /* invalid argument */ IPMP_ENOMEM, /* out of memory */ IPMP_ENOMPATHD, /* unable to contact in.mpathd */ IPMP_EUNKGROUP, /* unknown IPMP group */ IPMP_EUNKIF, /* interface is not using IPMP */ IPMP_EPROTO, /* unable to communicate with in.mpathd */ IPMP_EHWADDRDUP, /* interface has duplicate hardware address */ IPMP_NERR }; 'IPMP_NERR' is a convenience symbol indicating the allowable range of these error numbers; it is not an error code. 'IPMP_SUCCESS' indicates that no error occurred. We will add additional error codes as necessary by inserting them before IPMP_NERR; existing codes will not be renumbered. To make it easy for callers to print human-readable error messages, the following routine is provided to translate the above codes into internationalized strings: #include const char *ipmp_errmsg(int); If the error number is unknown, the string "" is returned in the C locale. Handles ------- Like many recent libraries, the basic unit of currency within libipmp is a handle, which is completely opaque to the caller. As with other libraries, multiple threads may interact with the library simultaneously provided each uses its own handle. A handle is created by ipmp_open(), and destroyed by ipmp_close(): #include int ipmp_open(ipmp_handle_t *); void ipmp_close(ipmp_handle_t); Note that although it looks like a structure is being passed by value, an ipmp_handle_t is actually a pointer. This is done to follow established practice and to encourage the caller to think of the handle as being completely opaque. Administrative Operations ------------------------- The first set of APIs provide control over in.mpathd's behavior: #include int ipmp_offline(ipmp_handle_t, const char *, uint_t); int ipmp_undo_offline(ipmp_handle_t, const char *); int ipmp_ping_daemon(ipmp_handle_t); For ipmp_offline(), the caller specifies the IP interface to offline via the second argument, and a "minimum redundancy" that must be met via the third argument. If the offline operation would cause the number of IP interfaces in the group to drop below this value, the operation will fail with IPMP_EMINRED. Similarly, the offline may fail if the named interface is not in an IPMP group, is in-use, or otherwise cannot be brought offline. For ipmp_undo_offline(), the caller also specifies the IP interface to bring back online via the second argument. The IP interface must already be offline. If in.mpathd internally brought the IP interface offline due to a duplicate hardware address, ipmp_undo_offline will fail with IPMP_EHWADDRDUP. Finally, the ipmp_ping_daemon() routine provides a convenient way to check if in.mpathd(1m) is running. Query Context ------------- Each IPMP handle has an associated "query context", used by the query operations detailed below. By default, the query context is "live", which means that the query interfaces will return the current state of the IPMP subsystem. However, operating on the live state can prove difficult when multiple operations need to be performed, since the state of the system may change out from under the application in subtle and confusing ways. To address this concern, the library also supports a "snapshot" context, so that an application to take a snapshot of the current IPMP subsystem and then perform a series of query operations on the snapshot. In this case, because all of the operations are performed using the snapshot, they are guaranteed to be self-consistent, though perhaps stale. The handle's current query context is set by calling ipmp_setqcontext(): #include int ipmp_setqcontext(ipmp_handle_t, ipmp_qcontext_t); The second argument must be IPMP_QCONTEXT_LIVE or IPMP_QCONTEXT_SNAP. If the argument is IPMP_QCONTEXT_SNAP, then a snapshot is taken and associated with the handle. Note that only one context is valid at a time, which means that setting the query context implicitly destroys the previous context, and means that at most one snapshot may exist at any point in time. In the unlikely case that multiple snapshots are needed, then the caller can simply create multiple handles. Query Operations ---------------- The query operations consist of four basic tasks: * retrieving the list of IPMP groups * retrieving the information for a given IPMP group * retrieving the IPMP information for a given IP interface. * retrieving the IPMP information for a given IP address. Each of these tasks is broken up similarly: one interface retrieves the information and stores it into a dynamically-allocated buffer, and another interface frees the information. Note that the latter does not take a handle, since the buffer's lifetime is independent from the handle's lifetime, and may indeed outlive the handle used to allocate it. For instance, the IPMP group list interfaces are: #include int ipmp_getgrouplist(ipmp_handle_t, ipmp_grouplist_t **); void ipmp_freegrouplist(ipmp_grouplist_t *); The IPMP group list structure itself is basically a dynamically- extended array of group names: typedef struct ipmp_grouplist { uint64_t gl_sig; unsigned int gl_ngroup; char gl_groups[1][LIFGRNAMSIZ]; } ipmp_grouplist_t; Note that the gl_sig field corresponds to the signature returned in the ESC_GROUP_MEMBER_CHANGE IPMP event, so that the returned information can be sorted relative to any received IPMP events. Similarly, the interfaces to retrieve IPMP group information are: #include int ipmp_getgroupinfo(ipmp_handle_t, const char *, ipmp_groupinfo_t **); void ipmp_freegroupinfo(ipmp_groupinfo_t *); The second argument to ipmp_getgroupinfo() is the name of the group to retrieve information about. If in.mpathd has been configured with TRACK_INTERFACES_ONLY_WITH_GROUPS set to `no', then information regarding interfaces not in an IPMP group can be retrieved by specifying an empty string ("") for the second argument. The group information structure is: typedef struct ipmp_groupinfo { char gr_name[LIFGRNAMSIZ]; uint64_t gr_sig; ipmp_group_state_t gr_state; ipmp_iflist_t *gr_iflistp; ipmp_addrlist_t *gr_adlistp; char gr_ifname[LIFNAMSIZ]; char gr_m4ifname[LIFNAMSIZ]; char gr_m6ifname[LIFNAMSIZ]; char gr_bcifname[LIFNAMSIZ]; unsigned int gr_fdt; } ipmp_groupinfo_t; The gr_sig field corresponds to the group signature returned by the ESC_IPMP_GROUP_STATE, ESC_IPMP_GROUP_CHANGE, and ESC_IPMP_IF_CHANGE IPMP events. Similarly, the type of the gr_state field is the same type that is used by the ESC_IPMP_GROUP_STATE event. The gr_iflistp points to a structure containing an array of interfaces belonging to the group: typedef struct ipmp_iflist { unsigned int il_nif; char il_ifs[1][LIFNAMSIZ]; } ipmp_iflist_t; Note that this structure was not "inlined" into the ipmp_groupinfo_t to make it possible to add new members to the ipmp_groupinfo_t structure in the future without affecting binary compatibility. Similarly, the gr_adlistp points to a structure containing an array of IPMP data addresses belonging to the group: typedef struct ipmp_addrlist { unsigned int al_naddr; struct sockaddr_storage al_addrs[1]; } ipmp_addrlist_t; The gr_ifname field specifies the name of the associated IPMP IP group interface. The gr_m4ifname, gr_m6ifname, and gr_bcifname fields specify the names for the IPv4 multicast, IPv6 multicast, and IPv4 broadcast interface nominations (respectively), or the NUL character if no nomination exists. Finally, gr_fdt specifies the group's current failure detection time (in milliseconds), or 0 if probe-based failure detection is disabled. Next, the interfaces to retrieve IPMP-related interface information are: #include int ipmp_getifinfo(ipmp_handle_t, const char *, ipmp_ifinfo_t **); void ipmp_freeifinfo(ipmp_ifinfo_t *); The second argument to ipmp_getifinfo() is the name of the interface to retrieve information about. The interface must be in an IPMP group, unless in.mpathd is configured with TRACK_INTERFACES_ONLY_WITH_GROUPS set to `no'. The interface information structure is: typedef struct ipmp_ifinfo { char if_name[LIFNAMSIZ]; char if_group[LIFGRNAMSIZ]; ipmp_if_state_t if_state; ipmp_if_type_t if_type; ipmp_if_linkstate_t if_linkstate; ipmp_if_probestate_t if_probestate; ipmp_if_flags_t if_flags; ipmp_targinfo_t if_targinfo4; ipmp_targinfo_t if_targinfo6; } ipmp_ifinfo_t; The if_group member is the group the interface belongs to, or '\0' if the interface does not belong to a group. The if_state and if_type members reflect the state and type of the interface, and are of the same types as the similarly-named fields in the ESC_IPMP_IF_CHANGE event. The if_linkstate and if_probestate fields reflect the state of the link- and probe-based failure detection using the following enumerations: typedef enum ipmp_if_probestate { IPMP_PROBE_OK, /* probes detect no problems */ IPMP_PROBE_FAILED, /* probes detect failure */ IPMP_PROBE_UNKNOWN, /* probe detection unavailable */ IPMP_PROBE_DISABLED /* probe detection disabled */ } ipmp_if_probestate_t; typedef enum ipmp_if_linkstate { IPMP_LINK_UP, /* link detects up */ IPMP_LINK_DOWN, /* link detects down */ IPMP_LINK_UNKNOWN /* link detection unavailable */ } ipmp_if_linkstate_t; The if_flags member provides assorted additional interface information: typedef enum ipmp_if_flags { IPMP_IFFLAG_INACTIVE = 0x1, IPMP_IFFLAG_HWADDRDUP = 0x2, IPMP_IFFLAG_ACTIVE = 0x4, IPMP_IFFLAG_DOWN = 0x8 } ipmp_if_flags_t; Finally, the if_targinfo4 and if_targinfo6 fields provide probe-based target information for IPv4 and IPv6, respectively: typedef struct ipmp_targinfo { char it_name[LIFNAMSIZ]; struct sockaddr_storage it_testaddr; ipmp_if_targmode_t it_targmode; ipmp_addrlist_t *it_targlistp; } ipmp_targinfo_t; Specifically, it_name matches if_name (provided for convenience), it_testaddr specifies the IP test address being used, it_targlistp points to an target address list, and it_targmode indicates the current target mode via the following enumeration: typedef enum ipmp_if_targmode { IPMP_TARG_DISABLED, /* use of targets is disabled */ IPMP_TARG_ROUTES, /* route-learned targets */ IPMP_TARG_MULTICAST /* multicast-learned targets */ } ipmp_if_targmode_t; Finally, the interfaces to retrieve IPMP-related address information are: #include int ipmp_getaddrinfo(ipmp_handle_t, const char *, struct sockaddr_storage *, ipmp_addrinfo_t **); void ipmp_freeaddrinfo(ipmp_addrinfo_t *); The third argument to ipmp_getaddrinfo() names the IPMP data address to retrieve information about. Because (down) IPMP data addresses may not be unique across groups, the IPMP group must also be provided via the second argument. The address information structure is: typedef struct ipmp_addrinfo { struct sockaddr_storage ad_addr; ipmp_addr_state_t ad_state; char ad_group[LIFGRNAMSIZ]; char ad_binding[LIFNAMSIZ]; } ipmp_addrinfo_t; The ad_addr and ad_group members echo the requested address and group, respectively. The ad_state member reflects the state of the address using the following enumeration: typedef enum ipmp_addr_state { IPMP_ADDR_UP, /* address is up */ IPMP_ADDR_DOWN /* address is down */ } ipmp_addr_state_t; Finally, for IPMP_ADDR_UP addresses, ad_binding names the IP interface that will receive incoming packets for this address. For IPMP_ADDR_DOWN addresses, ad_binding will contain the NUL character. Low-Level Operations -------------------- Finally, there are a set of low-level operations provided by libipmp that centralize the logic needed to pass the information between in.mpathd and the process making the queries. These operations are Project Private and are thus not available for use, even by contract. As such, we only provide a brief description of these routines: ipmp_addrinfo_create: allocate and initialize an ipmp_addrinfo_t ipmp_ifinfo_create: allocate and initialize an ipmp_ifinfo_t ipmp_groupinfo_create: allocate and initialize an ipmp_groupinfo_t ipmp_grouplist_create: allocate and initialize an ipmp_grouplist_t ipmp_targinfo_create: allocate and initialize an ipmp_targinfo_t ipmp_write: send data between libipmp and in.mpathd ipmp_writetlv: send type/length/value-style data between libipmp and in.mpathd ipmp_read: receive data previously sent between libipmp and in.mpathd ipmp_readtlv: send type/length/value-style data between libipmp and in.mpathd ipmp_snap_create: allocate and initialize a snapshot ipmp_snap_free: destroy a snapshot ipmp_snap_addaddrinfo add an ipmp_addrinfo_t to a snapshot ipmp_snap_addifinfo add an ipmp_ifinfo_t to a snapshot ipmp_snap_addgroupinfo add an ipmp_groupinfo_t to a snapshot To prevent accidental use, the prototypes for these routines are in the private ipmp_mpathd.h and ipmp_query_impl.h header files. Exported Interface Table ======================== libipmp.so Cons. Private IPMP library Cons. Private IPMP general interfaces Cons. Private IPMP query interfaces enum IpmpErrors Cons. Private IPMP_* error codes IPMP_QCONTEXT_LIVE Cons. Private ipmp_setqcontext flag IPMP_QCONTEXT_SNAP Cons. Private ipmp_setqcontext flag ipmp_addr_state_t Cons. Private ipmp_addrinfo_t Cons. Private ipmp_handle_t Cons. Private Opaque; contents Project Private ipmp_qcontext_t Cons. Private Opaque; contents Project Private ipmp_group_state_t Cons. Private ipmp_if_flags_t Cons. Private ipmp_if_state_t Cons. Private ipmp_if_linkstate_t Cons. Private ipmp_if_probestate_t Cons. Private ipmp_if_type_t Cons. Private ipmp_grouplist_t Cons. Private ipmp_groupinfo_t Cons. Private ipmp_iflist_t Cons. Private ipmp_ifinfo_t Cons. Private ipmp_targinfo_t Cons. Private ipmp_targmode_t Cons. Private ipmp_errmsg Cons. Private ipmp_open Cons. Private ipmp_close Cons. Private ipmp_offline Cons. Private ipmp_undo_offline Cons. Private ipmp_ping_daemon Cons. Private ipmp_setqcontext Cons. Private ipmp_getgrouplist Cons. Private ipmp_freegrouplist Cons. Private ipmp_getgroupinfo Cons. Private ipmp_freegroupinfo Cons. Private ipmp_getifinfo Cons. Private ipmp_freeifinfo Cons. Private ipmp_mpathd.h Project Private IPMP IPC messaging formats ipmp_query_impl.h Project Private Not shipped ipmp_addrinfo_create Project Private Not documented ipmp_ifinfo_create Project Private Not documented ipmp_groupinfo_create Project Private Not documented ipmp_grouplist_create Project Private Not documented ipmp_targinfo_create Project Private Not documented ipmp_write Project Private Not documented ipmp_writetlv Project Private Not documented ipmp_read Project Private Not documented ipmp_readtlv Project Private Not documented ipmp_snap_create Project Private Not documented ipmp_snap_free Project Private Not documented ipmp_snap_addaddrinfo Project Private Not documented ipmp_snap_addifinfo Project Private Not documented ipmp_snap_addgroupinfo Project Private Not documented