INTEGRATED LOAD BALANCER DESIGN DOCUMENT [Rev 1.3] 1. Overview ---------------- This document describes the functional components and the overall design of ILB project(PSARC 2008/575). The project will deliver the basic features that are needed to use Solaris on a x86/SPARC platform as a L3/L4 load balancer. The project will deliver the following features: o Stateless DSR(Direct Server Return) and NAT operation modes offering the following load balancing algorithms: round-robin, src IP hash, hash, hash IPv4 and IPv6 support will be provided for both operations. o A CLI and a configuration API to configure the various features and view statistics and configuration details. o Simple server monitoring features The project includes kernel and userland components. The following new packages will provide the ILB userland deliverables: SUNWilbr SMF manifest of ILB service at: var/svc/manifest/network/loadbalancer SUNWilb components delivered in /usr which are: o ilbadm o libilb o libilb.h o ilbd o ilbstat script which will be exec'ed by ilbd daemon to service "ilbadm show-statistics" subcommand only. o ilb_ping, ilb_probe which are to be used for server health check features The kernel component of ILB will be included in Solaris core package. 2. Terms used in this document ------------------------------- Stateless Direct Server return - Direct Server Return mode (DSR) refers to using the load balancer to load balance incoming requests to the back-end servers and letting the return traffic from the servers bypass the load balancer by being sent directly to the client. With stateless DSR, the load balancer will not keep any state information of the packets processed (load balanced), except for simple statistics. NAT based load balancing - NAT-based load balancing involves rewriting of header information, and handles both the request and the response traffic. There are two kinds of NAT:half-NAT and full-NAT. Both rewrite the destination address. However, full-NAT also rewrites the source IP address, making it appear to the server that all connections are originating at the load balancer. Server group - A server group comprises of zero or more back-end servers. When a server group is associated with a rule, all the servers in that server group are enabled for that rule by default. A single server can be included in multiple server groups. Virtual Service - A virtual service is what the world sees as VIP:port (eg www.foo.com:80). Although the service is being handled by a server group consisting of several servers, it appears to the clients of the Virtual service as a single IP address:port. A single server group can service multiple virtual services. VIP - Virtual IP address (VIP) is the IP address for the virtual service. Load balancing algorithm - A load balancing algorithm is the algorithm used by the load balancer to select a back-end server from a server group for a incoming request. Load balancing Rule - In ILB, a virtual service is represented by a load balancing rule. For the purposes of this document, a load balancing rule is defined by the following parameters: o Virtual IP address (VIP) o Transport protocol: TCP or UDP o Port number (or a port range) o Load balancing algorithm o Type of load balancing operation (DSR, full-NAT or half-NAT) o A server group consisting of a set of back-end servers o Optional server health check object that results in the sending of specific health check probes to each server in the server group. o Optional port to use for health check object. A user could specify health check probe on a particular port or on all ports that are configured for a virtual service. o A rule name to represent the virtual service. The load balancer uses the {VIP, transport protocol, port number} values to determine if an incoming packet matches a rule. If there is a match, the load balancer uses the specified load balancing algorithm to select a server from the server group. 3 Load balancer operation modes ------------------------------- The ILB project will provide in-kernel implementations of DSR, half-NAT, full-NAT operation modes. Phase 1 will support single legged and dual legged topologies (see Appendix F). As part of proof of concept work, we implemented DSR and half-NAT in kernel and compared the performance of our half NAT load balancer implementation with that of IP Filter(to ensure that ours does not perform worse than that of IP Filter). The comparative performance results are listed in Appendix A. After careful review of both implementations, we decided to use the standalone NAT load balancer version because it met the following criteria better than IP Filter's implementation: - Lightweight code, containing only the NAT-based load balancing feature, that can be extended easily to add load balancing algorithms as requested by customer - Fits well with rest of ILB code so that the load balancing algorithms can be shared by DSR and NAT - Minimizes conflict when system is running NAT based ILB and IP Filter NAT at the same time. It is important to note that unlike IP Filter NAT, the standalone version is not a full-blown NAT implementation; instead it is strictly limited to just load balancing functionality. 4. Command-line Interface ---------------------------------- The core functionality of the load balancer administration will be implemented in a library(libilb) for consumption by the CLI(ilbadm) and 3rd party applications. The location of the CLI will be /usr/sbin/. The location of the library will be /usr/lib/. The CLI will include subcommands to configure load balancing rules, server groups and health check objects. In addition to this it will also include various subcommands to display statistics as well as configuration details. The following are the two sets of ILB subcommands: Configuration subcommands: o create and destroy load balancing rules o enable and disable load balancing rules o create and destroy server groups o create and destroy server health check object o add and remove servers from a server group o enable and disable servers for load balancing rules View subcommands: o view configured load balancing rules o view server group details o view packet forwarding statistics o view nat connection table o view session persistence table o view server health check object configuration details o view server health check probe results A user must have the "solaris.network.ilb.config" RBAC authorization to run ILB configuration subcommands. The view subcommands do not require any authorization. A detailed list of commands and library APIS are provided in Appendix B and Appendix C respectively. 5. Server health check details ---------------------------------- ILB project will provide the following types of server health checks for the user to select from: o built-in ping probes o built-in TCP probes o built-in UDP probes o user supplied tests that would be run as health checks probes By default ILB does not perform any health checks. A health check object must be specified for the associated server group, when creating a load balancing rule. The user can only configure one health check object per virtual service (ie per load balancing rule). Probes for the configured health check object will run on a administratively enabled server for a virtual service as long as the service is enabled. When the service is administratively disabled, the health check probes will stop. Upon re-enable, the previous states of the health check probes are lost. When a server is administratively disabled, all health check probes for that server will stop. The following user configurable parameters apply when specifying a health check object. Note that a health check object can be associated with one or multiple load balancing rules: o hc-test - type of health check probe (ping probe, tcp probe, udp probe or user supplied test) o hc-interval- minimum interval between consecutive rounds of health check events. To avoid synchronization[6], the actual interval is randomized between [0.5 * hc-interval, 1.5 * hc-interval]. For TCP and user-supplied health check cases, a single health check event comprises of default ping probes, followed by 0 or some number of the higher level probes (the maximum number of such probes is defined by hc-count value). A maximum of 5 default probes are sent in a single health check event. If the default ping probes fail, the higher level probes are suppressed, as the server is considered to be unreachable. A user can specify -n option to turn off the default ping probes for these two cases. For UDP health check, a single health check event comprises of default ping probes, followed by 0 or some number of UDP probes (the maximum number of the probe defined by hc-count value). Note that in this case, the default ping probes cannot be turned off by the user. If these probes fail, the UDP probes are suppressed, as the server is considered to be unreachable. For ping health check case, a single health check event comprises of 1 or more ping probes (the maximum number of these probes is defined by hc-count) o hc-timeout - if a health check probe does not return a result within this period, the probe is considered to have failed. o hc-count - number of consecutive failed probe attempts done before declaring a server dead. If the health check object is associated with multiple rules, the probe results of a server will be shared by all those rules. Thus if a server with IP address a.b.c.d is used in multiple rules using the same health check object, only one probe is sent to that server at every interval specified by hc-interval value. The udp probe health check method will consist the following steps: 1. Send a ping probes to the server. If no response is received, we assume the server is unreachable and thus the server's UDP service is considered to be down. If the server responds then we proceed to step 2. 2. Send empty udp probe and wait for "hc-timeout" period for an ICMP unreachable message. If one does not arrive, the udp probe is considered to be successful. Otherwise, a maximum number of "hc_count" probes are sent before server's UDP service is considered to be down. 6. Using VRRP project(PSARC 2009/693) for High availability ------------------------------------------------ ILB Phase 1 is not dependent on VRRP project. This section and Appendix D is included in this document to provide information on how ILB plans to use VRRP to solve specific redundancy scenarios when its available. Once VRRP project(PSARC 2009/693) is delivered, ILB can provide optional HA capability for active-standby redundancy configuration. The active-standby configuration consists of a pair of load balancers, yet only one of them is active (this is the primary load balancer) while the other stays in standby mode. Should the primary fail, the standby will take over the primary's job. The VRRP protocol will be used for the selection of primary load balancer[5]. Note that ILB will only provide redundancy for machine failures, and will not handle switch failures. We will use other existing mechanisms like link aggregation to handle switch failures. In order to make load balancer failover transparent to client applications, the primary load balancer needs to synchronize its state (e.g connection information) with the standby load balancer. This is needed so that when the primary fails and the standby takes over, it will have the state of most connections, so that almost all connections can continue to access the virtual service through the standby. ILB project will not deliver this synchronization capability in Phase 1 Note that HA without synchronization is still valuable as upon the primary's failure, it allow the user to have service to reconnect to. To set up HA capability, the user will have to manually configure both the primary and the standby via VRRP CLI and use the export subcommand of ILB CLI(see Appendix B) to acquire an editable copy of the primary's persistent configuration, modify it as necessary and copy it over to the standby. Appendix D lists the specific redundancy scenarios that ILB Phase 1 will be able to handle 7 Other capabilities ------------------- Other capabilities include the following: 1. Ability for clients to ping VIP address - The load balancer needs to be able to respond to ICMP echo requests to VIPs from clients. Both DSR and NAT will provide support for this feature. 2. Ability to add and remove servers without interrupting service - This capability allows one to dynamically add servers to and remove servers from a server group of an active rule, without interruption of existing connections established to back-end servers. ILB will provide support for this feature for NAT based virtual services. 3. Session persistence - For many applications, it is important that a series of connections and/or packets from the same client are sent to the same back-end server. Ideally, the addition or removal of a back-end server shouldn't interfere with established persistent sessions. ILB will provide the admin the capability to configure session persistence for a virtual service by specifying the option "persist=". After a persistent mapping is created, subsequent requests/packets to this virtual service with a matching source IP address of the client (after the persistent netmask is applied) will be forwarded to the same back end server. 4. Connection draining - ILB will provide support for this capability only for servers of NAT-based virtual services. This capability prevents new connections to be sent to a server that is administratively disabled. Existing connections to the server will continue to function. After all the connections to that server terminate, the server can then be taken down for maintenance. Once the server is ready to handle requests, the admin will administratively enable the server so that the load balancer can forward new connections to it. This allows administrators to take down servers for maintenance without disrupting active connections/sessions. 5. Load balancing all ports - ILB will provide the ability to load balance all ports on a given IP address across the set of servers, without having to set up explicit rules for each port. This feature will be available for NAT and DSR operation modes. 6. Independent ports for virtual services in same server group - For NAT based virtual services, it will be possible to specify different destination ports for different servers in the server group. 7. Load balance simple port range - This capability allows one to load balance a range of ports on the VIP to the given server group. It's sometimes convenient to be able to conserve IP addresses by load balancing different port ranges on the same VIP to different sets of back-ends. Both DSR and NAT will provide support for this feature. In addition, when session persistence is enabled for NAT based load balancing, requests from the same client IP for different ports in the range should be sent to the same back-end server. 8. Port range shifting and collapsing - These features will be provided by NAT operation mode. Port range shifting means the following: Rule: VIP(n:N) -> {IP1(n1:N1), IP2(n2:N2), ... } When the load balancer gets a packet where n <= m <= N, it will load balance the packet to IP1, IP2 etc, and re-write the packet to IP1 as Port range collapsing means the following. Rule: VIP(n:N) -> {IP1:n1, IP2:n2, ... } When load balancer gets a packet where n <= m <= N, it will load balance the packet to IP1, IP2 etc and re-write the packet (suppose half NAT) to IP1 as Port range shifting and collapsing depend on the port range of a server in a rule. So if the port range of a server is different from the VIP port range, it automatically means port shifting will be implemented. If the server port range is a single port, port collapsing is implemented. Port range shifting and collapsing will be provided for NAT based virtual services. 8. Architecture ---------------- The following diagram shows the major components of ILB: --------------------- |ilbadm CLI interface| --------------------- | | V ---------- AF_UNIX sockets ------- | libilb |<------------------>|ilbd | ---------- ------- ^ | |ioctl | V ------------------------------------ | Kernel ILB Engine | ------------------------------------ The major components are: ilbadm - This is the CLI of ILB. An admin will use this interface to configure load balancing rules, server groups and optional health checks as well view statistics libilb - This is the configuration library. The core functionality of the load balancer administration will be implemented in this library for consumption by ilbadm and 3rd party applications. ilbd - The ilbd daemon has the following tasks: o manage persistent configuration o serialize access to the kernel ILB module by processing configuration and statistics display requests from libilb and feeding them to kernel ILB for execution. o perform health checks (built-in health checks and as well as run user-supplied test scripts as health checks) and notify the kernel ILB module for server health so that the load distribution is adjusted properly. 9 SMF details and storing of ILB configuration ----------------------------------------------- Since all virtual services are managed by a single ilbd daemon, ILB will have a single instance in SMF framework: svc:/network/loadbalancer/ilb:default The ILB service will be dependent on the following services: svc:/milestone/name-services svc:/network/ipv4-forwarding or svc:/network/ipv6-forwarding The start method of the SMF manifest will invoke the ilbd daemon. The stop method will invoke "/usr/sbin/ilbadm shutdown" command which will terminate the ilbd daemon and remove all ILB load balancing rules in the kernel. The ILB service will use the management authorization of "solaris.smf.manage.ilb". The persistent configuration of ILB will be saved in SCF. The benefits of using SCF over flat text file are: o no need to write a parser in ilbd daemon code. o property groups can be changed atomically. Since ILB project will not deliver any basic configuration, SCF will not hold any configuration content when ILB service is enabled the first time. Upon enabling the service, when the admin uses ilbadm CLI to configure the virtual services, the configuration will be implicitly committed to SCF by ilbd daemon(in fact all configuration manipulations will be implicitly committed). In subsequent boots, when ilbd daemon starts, it will retrieve the configuration from SCF and copy it to memory provided the content is present. The supported interface to manipulate ILB configuration is ilbadm (SCF doesn't offer a means of preventing users from using svccfg to manipulate the config but we strongly discourage it in order to avoid inconsistency in configuration) . The command "ilbadm export-config " will export the current configuration to the user-specified file. This information can then be used as input for the "ilbadm import-config " command; this command will wipe the existing configuration before import unless specifically instructed to retain it. Omission of filename will mean reading from stdin and writing to stdout. Note that the output of export and import commands will not be equivalent to that of "svccfg import/export" as the latter will include metadata. If user wants to import/export ILB configuration, ilbadm's export/import subcommands should be used (ie. "ilbadm import-config " command cannot read output of "svccfg export" and vice versa). ILB's rules, server groups and health check objects will be represented as property groups in SCF. The creation, deletion, and modification of these objects will result in the creation, deletion and modification of corresponding property groups. Below is a sample layout of the ilb property groups/properties: prop-group name | property name | property type | property value ------------------------------------------------------------------------- rule123 status boolean disabled/enabled vip net-v4/6-addr v4/6 IP address port astring port range protocol astring TCP/UDP... ilb-alg astring round-robin... ilb-type astring NAT/DSR... healthcheck astring health check name drain-time int sec nat-timeout int sec pers-timeout int sec hc-port astring ALL/ANY/some-number servergroup astring servergroup name [.... more rules ....] servergroup123 status boolean disabled/enabled server1 astring IP-addr:port:enable/disable server2 astring IP-addr:port:enable/disable [ .... more servers ....] healthcheck123 hc-test astring test program hc-timeout int timeout value in sec hc-interval int interval val in sec hc-count int test repetition [ .... more health checks ....] 10. The specifics of ilbd daemon --------------------------------- 10.1 ilbd daemon internals The core of ilbd daemon will be a single-threaded event loop using event completion framework; it receives requests from client using the libilb functions, handles timeouts, initiates health checks, and populates the kernel state. We choose to use event_port framework[2,3] over poll/select because of ease of implementation, some of which are these: o Unlike with poll() one does not need to walk the entire set of file descriptors to find out which one(s) had activity. Walking the list is an O(N) activity which does not scale well. o There is no need to handle timers via signal. One can simply associate a timer with a event port. To do health check, the daemon will create a timer for every health check probe. Each of these timers will be associated with the event port. When a timer goes off, the daemon will initiate a pipe to a separate process to execute the specific health check probe. This new process will run with the same user-id as that of ilbd daemon and will inherit all the privileges from the ilbd daemon parent process except the following: PRIV_PROC_OWNER, PRIV_PROC_AUDIT All health checks, will be implemented as external methods (binary or script). The following arguments will be passed to external methods: $1 VIP (literal IPv4 or IPv6 address) $2 Server IP (literal IPv4 or IPv6 address) $3 Protocol (UDP, TCP as a string) $4 Numeric port range $5 maximum time (in seconds) the method should wait before returning failure. If the method runs for longer, it may be killed, and the test considered failed. Upon success, a health check method will exit with status 0 and print the RTT it calculated to its stdout (the implicit unit is microseconds but only the number will be printed), for ilbd to consume. If it doesn't calculate RTT,it should print "0". Upon failure, a health check method should exit with status 255( no output is expected). By default, the user-supplied health check probe processes will also run with the same set of privileges as ILB's built-in probes. If the administrator has user-supplied health check programs that require a larger privilege set, he/she will have to implement setuid program. Each health check will have a timeout, such that if the health check process is hung, it will be killed after the timeout interval and the daemon will notify the kernel ILB engine of the server's unresponsiveness, so that load distribution can be appropriately adjusted. If on the other hand the health check is successful the timeout timer is cancelled. Here is the pseudo code: port_create() associate socket to obtain requests from libilb forever() { port_get() switch (event type) { case new request from client: get peer credentials allocate data struct to store client info associate the socket with port trigger request processing event case request processing check is authorization call the service routine to process the request apply config change to kernel update internal state re-associate socket with event return (data and) success/fail status case health check interval timer fired: posix_spawn(HC test program) create timeout timer for this test; associate timeout timer with port; associate new timer with port; associate pipe with port; case health check test has returned results: record RTT if returned update kernel if test failed cancel associated timeout timer case health check test timeout fired: kill the HC process update kernel with "serverX for Load balancing rule A is dead" } } 10.2 IPC details and privileges for ilbd daemon We will use AF_UNIX socket (socket type of SOCK_SEQPACKET) for IPC between libilb and ilbd as both processes will run on the same machine. Given the ilbd daemon is a single threaded process that handles all events (except health check) via PEF, we choose to use AF_UNIX sockets over doors to avoid additional complexity (that being use an additional steps to feed information received via doors into this PEF framework). The /var/run/daemon directory will hold the AF_UNIX rendezvous file. The ilbd daemon will run with "daemon" user-id with the following privileges (in addition to the basic ones): PRIV_PROC_OWNER, PRIV_NET_ICMPACCESS, PRIV_SYS_IP_CONFIG, PRIV_PROC_AUDIT The aforementioned privileges will be specified in the SMF manifest. Note that the PRIV_PROC_OWNER privilege is needed to allow ilbd daemon process to kill a user-supplied setuid program that is to be used as health check. Since ilbd daemon will require RBAC authorization to create, delete, modify, and read the ILB project related SCF property groups (see section 9), the following entry will be added to /etc/user_attr file: daemon::::auths=solaris.smf.manage.ilb,solaris.smf.modify.application The ILB project will audit administration using the auditing interfaces that are defined by PSARC 2000/517 10.3 Error handling and signal handling Errors will be reported to syslog. The following signals will be ignored by ilbd: SIGTTOU,SIGPIPE,SIGSTOP,SIGTSTP,SIGTTIN 11 ILB kernel components ---------------------------- The ILB code resides in the IP module. It provides two load balancing mechanisms, stateless DSR and NAT (half and full) for UDP and TCP traffic. Userland application can open a socket() and issue ioctl() on this socket to communicate with the ILB code. The ILB code intercepts incoming packets right before IP decides if a packet is destined locally or to be forwarded. It is after the "physical in" and before the "forwarding" Packet Filtering Hooks processing. If there is a load balancing rule, the ILB code will be invoked to check if the packet needs to be load balanced. Note that the placement of the interception implies that the ILB code cannot do load balancing for local traffic. We have chosen this design instead of extending IP Filter hooks to ensure that the order of ILB processing and that of IP Filter is correct. Furthermore, should we in the future need an ILB hook on the transmit side, that hook wouldn't belong where pfhooks sits on the transmit side; we'd need the transmit hook to be before IRE lookup and fragmentation, instead of at the bottom of the IP output code(where pfhook transmit is). If an incoming packet matches a load balancing rule, the rule's algorithm will be used to select a back end server. If the rule requires the use of NAT, the header of the packet will be re-written with the NAT info. After the server selection and header re-write, the normal IP incoming packet process will continue using the selected server's IP address as the destination. If an incoming packet is a fragment destined to a VIP of any load balancing rules, ILB will drop it. This may limit the deployment of ILB phase 1 for certain applications, and we plan to address this as a post-ILB Phase1 RFE. 11.1 ICMP processing The ILB code has some special handling for incoming ICMP packets destined to one of the load balancing rules' VIP. If the ICMP packet is an echo request, the ILB code will reply this request on behalf of the back end servers. Note that a VIP can be used in more than one rule. And an ICMP echo message does not include enough information for the ILB to decide which rule to use to handle it. So the ILB code needs to handle this itself. If the ICMP message is "destination unreachable: fragmentation needed," the ILB code checks the payload of the message and finds out if the message should be forwarded to a back end server. If the ICMP message needs to be forwarded, the ILB code will re-write the ICMP IP header and the header inside the ICMP message appropriately. This forwarding is possible for rules using NAT or rules using DSR with persistence enabled. ILB will drop all other ICMP messages destined to a VIP. 11.2. New IP ioctl The project introduces a new ioctl command SIOCILB (defined in . Its classification is Private. User land code, such as ilbd, uses this new ioctl to configure the kernel ILB component. 11.3. Interaction with other IP technologies IPMP - The position of the interception point ensures that ILB works well with IPMP. IPsec - ILB cannot load balance IPsec encrypted traffic since ILB cannot read the transport header. Packet Filter Hooks - ILB does not interfere with any registered hook. For example, it should work well with a firewall module using PF hooks. But since ILB may modify the header information, it can have unwanted interaction with modules which also modify header information. Note that this interaction is deterministic since the position of ILB interception is fixed and the possible modification of a packet can be derived from the ILB rules. A system administrator needs to be careful in using ILB with this kind PF hook module, such as IP Filter NAT. One case that would require careful configuration is a box that is configured to do both stateful filtering that routes packets, and ILB NAT based load balancing. Current stateful filtering that routes packets only requires one state entry per "flow" (i.e. TCP connection) even though IPFilter's NAT may be used on either side. This is because IPFilter is able to match up packet details in its own NAT table. The use of ILB NAT will require 2 state table entries per connection. The sample configuration is listed Appendix E (see Case 2). While it's not feasible to test every possible (though improbable) deployment scenario that involves packet filtering and ILB Appendix E contains deployment scenarios that we expect to be commonly used. ILB project testing will include these scenarios to ensure that they work as expected. Observability hooks(PSARC 2006/475) - ILB has no interaction with the observability hooks. All the hooks work as expected. 11.4. kstat The kernel exports the following kstat. They are Private interface to be used by ILB user land applications. Global kstat: Module: ilb Class: kstat Name: global ip_frag_in: IP fragments in matching a VIP ip_frag_dropped: IP fragments dropped num_rules: Total number of rules Per rule kstat: Module: ilb Class: rulestat Name: pkt_dropped: Total number of packet dropped bytes_dropped: Total number of bytes dropped pkt_not_processed: Total number of packet not processed because the rule is disabled bytes_not_processed: Total number of packet not processed because the rule is disabled nomem_pkt_dropped: Number of packets dropped because of no memory nomem_bytes_dropped: Number of bytes dropped because of no memory noport_pkt_dropped: Number of packets dropped because of no source port to used in full NAT mode noport_bytes_dropped: Number of bytes dropped because of no source port to used in full NAT mode num_servers: Number of servers for this rule icmp_dropped: Number of ICMP packet dropped icmp_echo_processed: Number of ICMP echo request replied icmp_too_big_dropped: ICMP fragmentation needed message dropped icmp_too_big_processed: ICMP fragmentation needed message processed Per server kstat: Module: ILB Class: -sstat Name: ip_address: IP address of the server bytes_processed: Total number of bytes processed pkt_processed: Total number of packets processed 12. Reference --------------- 1. http://www.opensolaris.org/os/project/vrrp/vrrp_design.pdf 2. http://developers.sun.com/solaris/articles/event_completion.html 3. Man pages port_get(3C), port_associate(3C), port_create(3C) 4. Man page privileges(5) 5. http://www.ietf.org/rfc/rfc3768.txt 6. ftp://ftp.ee.lbl.gov/papers/sync_94.ps.Z ******************************************** Appendix A: POC performance results ******************************************** Test Topology ------------ | | -------------- | L3/L4 LB | | | | | ----------------- | | subnet 1 | x4200m2 | | | |Ixia mimicing|--------------|e1000g0 | |Ixia mimicing 4 | |238 clients | | e1000g1 |---------- |back-end servers| | | | | subnet 2 ------------------ --------------- ------------ Hardware --------- DUT x4200 with e1000g nics Ixia details: Ixia 400T 8 port chassis with IxLoad version 3.30.42.143 Traffic: HTTP1.0/1.1 requests Page size for concurrent connections and CPS: 1 byte HTML file Page size for throughput: 64Kbyte HTML file Performance Results ===================== # of CPUS Mode CPS Concurrent Tput Connections (Mbps) ======================================================== 4 IPFNAT-RR 34,000 450,000 932 4 ILBNAT-RR 41,500 850,000 920 4 DSR(srcIP -hash) - - 2296 ******************************************** Appendix B: ilbadm Commands ******************************************** Syntax ------ NAME ilbadm - manipulate load balancing rules SYNOPSIS ilbadm create-rule [-e] \ -i vip=value,port=value[,protocol=value] \ -m lbalg=value,type=value[,proxy-src=ip-range][,persist=mask] \ [-h hc-name=value[,hc-port=value]] [-t [conn-drain=N] \ [,nat-timeout=N],[persist-timeout=N]] -o servergroup=value name ilbadm show-rule [-e|-d] [-f |[-p] -o key[,key ...]] [name ...] ilbadm delete-rule -a | name ... ilbadm enable-rule [name ... ] ilbadm disable-rule [name ... ] ilbadm show-statistics [-thAdvi] [-r rule] [-s server] \ [interval [count]] ilbadm create-servergroup [-s server=hostspec[:portspec...]] groupname ilbadm delete-servergroup groupname ilbadm show-servergroup [-s|-f|[-p] -o field[,field]] [[-v] name] ilbadm enable-server server ... ilbadm disable-server server ... ilbadm show-server [[-p] -o field[,field...]] [rulename ...] ilbadm add-server [-e] -s server=value[,value ... ] name ilbadm remove-server -s server=value[,value ... ] name ilbadm create-healthcheck [-n] -h hc-test=value[,hc-timeout=value] \ [,hc-count=value][,hc-interval=value] hcname ilbadm delete-healthcheck hcname ... ilbadm show-healthcheck [hcname ... ] ilbadm show-hc-result [rule-name] ilbadm show-nat [count] ilbadm show-persist [count] ilbadm export-config filename ilbadm import-config [-p] filename NOTE: server group and health check must be defined before it can be used in "create-rule" subcommand DESCRIPTION ilbadm manipulates or displays information about ILB rules using the subcommands outlined below. Rulenames are case insensitive, but case is preserved as it is entered. Names are limited to 20 characters. Parseable output: all parsable output requires that the fields to be printed be given with the -o option. Fields will be printed in the same order they are encountered on the commandline, separated by ':' characters (if there's more than one value). If this character occurs in the printed string itself, it will be preceded by a '\'; the same is done for the '\' character itself. No headers will be printed for parsable output. In this implementation, server IDs are generated by the system when a server is added, either using the "create-servergroup" or the "add-server" subcommands. Server IDs are guaranteed to be unique within the server group; since a rule can only be attached to one server group, this makes serverIDs unique for rules as well. (NOTE: since more than one rule can attach to the same server group, the server ID alone is not sufficient to indicate a rule). To be able to distinguish server IDs from hostnames, server IDs are prefixed with a leading underscore ("_"). Subcommands: create-rule|create-rl [-e] -i -m \ -o [-h ] [-t ] name creates a rule "name" with the given characteristics. and are both specified as a set of "key=value" pairs. The following keys and values are valid: -i introduces the matching criteria for incoming packets: vip (virtual) destination ip address port[-port] port number or name (eg, "telnet", "dns") a port can be specified by port number or symbolic name (as in /etc/services) port ranges are also supported (numeric only) protocol "TCP" (default), "UDP" (see /etc/protocols) -m the keys describing how to handle a packet: lbalg "roundrobin" (default), or "rr" (short form), "hash-ip", or "hip", "hash-ip-port", or "hipp", "hash-ip-vip", or "hipv" type aka topology: "DSR", or "d" (short form), "NAT", or "n" "HALF-NAT", or "h" proxy-src: (required for (full) NAT only): specifies the IP (range) to act as proxy-src address (range). persist (optional) (alias: stickiness) indicate that this rule is to be "persistent" (aka "sticky"); the argument is a netmask in CIDR notation (eg. "/16" ). -t the key describing customized timers in seconds, 0 means to use the system default value. conn-drain: timeout when connection to a server will terminate after the server is removed default: connection stays until it is gracefully shutdown nat-timeout: NAT entry timeout persist-timeout: persistent entry timeout -o specifies to which destinations a packets matching the criteria specified with -i will be distributed among: servergroup specify a single server group as target. The server group must already have been created. -h hc-name specifies the name of a pre-defined health check method hc-port specifies the port(s) for the hc test program to check. The value can be keyword "ALL" "ANY" or a specific port number within the port range of the server group. If none is specified the value set in port keyword will be used for health check OPTIONS: -e create rule enabled (default: disabled) If "name" already exists, the command will fail. If the given tuple matches another rule, the command will also fail. delete-rule|delete-rl -a | name ... remove all information pertaining to rule "name". If "name" doesn't exist, command will fail. -a delste all rules. (name will be ignored) enable-rule|enable-rl [name ... ] enables a named rule (or all, if no names are given). Enabling rules that are already enabled is a noop. disable-rule|disable-rl [name ... ] disables a named rule (or all, if no names are given). Disabling rules that are already disabled is a noop. show-statistics|show-stats [-thAdvi] [-r rule] [-s server] [interval [count]] shows statistics -t print a timestamp with every header -d print delta over whole interval (default: changes per second) -A print absolute numbers (since module initialization/rule creation/server addition) -r print statistics for only the given rule; with -i, print a line for every server -s print statistics for only the given server with -i, print a line for every rule -i (only valid with -r and -s) itemize info (see those options) -v verbose: shows more details for drops While for the most part the behaviour is intuitive and usage can be directly adapted from vmstat etc., a few points: - headers are printed once for every 10 samples. This is hard-coded. - timestamps, if chosen, are printed before the header. The format is fixed to the system's "date" format for the C locale. - currently, addition or removal of a rule is neither detected nor indicated. show-rule|show-rl [-d|-e] [-f|[-p] -o field[,...]] [name ...] prints characteristics of the specified rules, or all, if none is specified. -o lists fields to be printed. -p print parsable output in the format explained above. requires -o -f prints a full list. -e print only enabled rules (default: all) -d print only disabled rules -o (with or without -p), and -f are mutually exclusive show-nat [count] Displays NAT table information If "count" is given, displays "count" entries in the NAT table. If no count is given, the whole NAT table is displayed. No assumptions should be made about the relative positions of elements in consecutive runs of this command. For example, executing "show-nat 10" twice is not guaranteed to show the same 10 items twice, esp. on a busy system. Display format: T: IP1 > IP2 >>> IP3 > IP4 T: The transport protocol used in this entry. IP1: The client's IP address and port. IP2: The VIP and port. IP3: If half NAT mode, the client's IP address and port. If full NAT mode, the NAT'ed client's IP address and port. IP4: The backend server's IP address and port. show-persist [count] Displays persistence table information If "count" is given, displays "count" entries in the table. If no count is given, the whole table is displayed. No assumptions should be made about the relative positions of elements in consecutive runs of this command. For example, executing "show-persist 10" twice is not guaranteed to show the same 10 items twice, esp. on a busy system. Display format: R: IP1 --> IP2 R: The rule this persistence entry is tied to. IP1: The client's IP address. IP2: The backend server's IP address. export-config [filename] exports the current configuration in a format fit for re-import using 'ilbadm import'. If no filename is given, write to stdout. import-config [-p] [filename] reads configuration contents of the file. By default, this overrides existing configuration. If no filename is given, read from stdin. -p preserve existing configuration and do incremental import. create-servergroup|create-sg [-s server=hostspec[:portspec...]] groupname creates a server group. additional servers can later be added using the "add-server" subcommand. Server groups are the only entity that can be used during rule creation to indicate back-end servers Options: -s specifies a list of servers to add to the server group. hostspec: hostname|ip[-ip] IPv6 addresses must be enclosed in brackets "[]" to distinguish them from ":port" portspec: service|port[-port] disable-server|disable-srv server ... server: serverID, hostname or IP address disable one or more server(s) (ie tell the kernel not to forward traffic to this server). In the current implementation, 'disable-server' applies to all rules that are attached to the SG this server is part of. enable-server|enable-srv server ... (re)enables disabled server(s). See "disable-server" for further details. delete-servergroup|delete-sg groupname deletes a server group. ilbadm show-server|show-srv [[-p] -o field[,field...]] [rulename ...] displays servers associated to named rules (or all if no rulename is specified) Options: -o print the specified fields -p print fields in parsable format (see above), requires -o show-servergroup|show-sg [-f|[-p] -o field[,field]] [-v name] lists a server group (or all, if no name is given). Options: -f full (the default is names only - currently all is -f) -o print the specified fields -p print fields in parsable format (see above), requires -o the options -f and -o (with or without -p) are mutually exclusive. add-server|add-srv [-e] -s server=value[,value ...] servergroup add server(s) specified to server group see "create-servergroup" for definition of value -e add server and enable it (default: disabled) -s see create-servergroup remove-server|remove-srv -s server=value[,value ...] servergroup remove server(s) from server group -s serverid, hostname or ip address create-healthcheck|create-hc [-n] -h \ hc-test=value[,hc-timeout=value][,hc-count=value] \ [,hc-interval=value] hcname sets up health check information for rules to use. the hc-test is performed hc-count times until it succeeds or hc-timeout has expired. For this implementation, all servers for a rule are checked using the same test. hc-test "ping", "tcp", "udp",external method (script, binary ...) hc-timeout until a test is to be considered failed if hc-test never succeeds. default value: 5 sec hc-count number of attempts to run hc-test default value: 1 hc-interval time between two consecutive rounds of health check events. default value: 10 sec The following arguments are passed to external methods: $1 VIP (literal IPv4 or IPv6 address) $2 Server IP (literal IPv4 or IPv6 address) $3 Protocol (UDP, TCP as a string) $4 Numeric port $5 Maximum time (in seconds) the method should wait before returning failure. If the method runs for longer, it may be killed, and the test considered failed. External methods should return 0 for success and 255 for failure. All other return values are reserved for future use. For higher layer health check(s), TCP, UDP and external tests, a default ping test will be performed first. The higher layer test won't be performed if ping failed. Administrator can turn off the default ping check for these high layer health checks by -n; -n disable default ping test for high layer health check tests. This flag cannot be applied when specifying udp probe method. delete-healthcheck|delete-hc hcname ... delete the named health check object(s). When the given health check object is associated with some enabled rule(s), delete of this health check object will fail. show-healthcheck|show-hc [hcname ...] show the health check information of the given health check. Without given name, list all existing health checks. show-hc-result|show-hc-res [rule-name] show the health check result for the servers that are associated with the given rule-name. If rule-name is not given, the servers for all rules are displayed. shutdown deactivates ilb: removes rules from kernel, terminates daemon. OUTPUT OF SHOW-XXX SUBCOMMANDS Example 1: print a list of rules ('$' is command prompt) 1a $ ilbadm show-rule -f r1 rulename: r1 status: E port: 80 protocol: TCP lbalg: hash-ip type: NAT proxy-src: 47.0.0.0-47.0.0.3 persist: 255.255.255.127 hc-name: hc1 hc-port: -- conn-drain: 0 nat-timeout: 120 persist-timeout: 60 servergroup: sg1 vip: 45.0.0.10 serverlist: _sg1.0,_sg1.1,_sg1.2 (all but "serverlist" is implemented, btw) 1b $ ilbadm show-rule r1 RULENAME STATUS LBALG TYPE PROTOCOL VIP PORT r1 E hash-ip NAT TCP 45.0.0.10 80 r2 E hash-ip NAT TCP 45.0.0.10 81 Example 2: show all health check objects $ ilbadm show-healthcheck HCNAME TIMEOUT COUNT INTERVAL DEF_PING TEST hc1 3 2 8 Y tcp hc2 3 2 8 N /var/usr-script Example 3: show statistics for interval of 3 secs and count of 2. $ ilbadm show-statistics 3 2 PROCESSED UNPROCESSED DROPPED packets bytes pkt bytes pkt bytes 6 349 0 0 0 0 2 116 0 0 0 0 Example 4: show healthcheck result $ ilbadm show-result rule1 RULENAME HCNAME SERVERID TEST STATUS FAIL LAST NEXT rule1 hc1 _sg1:0 tcp server-alive 3 11:23:30 11:23:40 rule1 hc1 _sg1:1 tcp server-dead 4 11:23:30 11:23:40 Example 5: show the NAT table $ ilbadm show-nat 5 UDP: 124.106.235.150.53688 > 85.0.0.1.1024 >>> 82.0.0.39.4127 > 82.0.0.56.1024 UDP: 71.159.95.31.61528 > 85.0.0.1.1024 >>> 82.0.0.39.4146 > 82.0.0.55.1024 UDP: 9.213.106.54.19787 > 85.0.0.1.1024 >>> 82.0.0.40.4114 > 82.0.0.55.1024 UDP: 118.148.25.17.26676 > 85.0.0.1.1024 >>> 82.0.0.40.4112 > 82.0.0.56.1024 UDP: 69.219.132.153.56132 > 85.0.0.1.1024 >>> 82.0.0.39.4134 > 82.0.0.55.1024 Example 6: show the persistence table $ ilbadm show-persist 5 rule2: 124.106.235.150 --> 82.0.0.56 rule3: 71.159.95.31 --> 82.0.0.55 rule3: 9.213.106.54 --> 82.0.0.55 rule1: 118.148.25.17 --> 82.0.0.56 rule2: 69.219.132.153 --> 82.0.0.55 Example 7: show servergroups: 7a. $ ilbadm show-servergroup -f sg1: id:_sg1.2 35.0.0.4:80 sg1: id:_sg1.1 35.0.0.3:80 sg1: id:_sg1.0 35.0.0.2:80 sg2: id:_sg2.3 35.0.0.5:81 sg2: id:_sg2.2 35.0.0.4:81 sg2: id:_sg2.1 35.0.0.3:81 sg2: id:_sg2.0 35.0.0.2:81 7b.$ ilbadm show-servergroup -o all SGNAME SERVERID MINPORT MAXPORT IP_ADDR sg2 _sg2.0 0 0 :: sg2 _sg2.1 0 0 1.1.1.6 sg3 _sg3.0 9001 9001 1.1.1.1 sg3 _sg3.1 9001 9001 1.1.1.2 sg3 _sg3.2 9001 9001 1.1.1.3 sg3 _sg3.3 9001 9001 1.1.1.4 sg3 _sg3.4 9001 9001 1.1.1.5 sg3 _sg3.5 9001 9001 1.1.1.6 sg3 _sg3.6 9001 9001 1.1.1.11 sg3 _sg3.7 9001 9001 1.1.1.12 sg3 _sg3.8 9001 9001 1.1.1.13 sg4 _sg4.0 9001 9006 1.1.1.1 sg4 _sg4.1 9001 9006 1.1.1.6 Example 8: show the servers that are associated to a rule : $ ilbadm show-server r1 SERVERID IP_ADDR PORT HOSTNAME STATUS RULENAME SGNAME _sg1.0 35.0.0.2 80 -- D r1 sg1 _sg1.1 35.0.0.3 80 my.org E r1 sg1 _sg1.2 35.0.0.4 80 your.org E r1 sg1 ******************************************** Appendix C: libilb APIs ******************************************** Interface specification for libilb libilb will provide all functionality for ILB interaction =============== - create and destroy ilb rules - enable and disable rules - add and remove back-end server for a given rule - enable and disable servers - retrieve the list of rules currently known to the kernel. - provide a walker function that can call a function supplied to the lib via pointer for every rule, servergroup and healthcheck. this list of functionality is NOT part of the library: - command-line handling - printout of any data - UI-specific config file handling or interaction with SMF suggested API list: =================== ilb_open() ilb_close() ilb_create_rule() ilb_destroy_rule() ilb_disable_rule() ilb_enable_rule() ilb_walk_rules() ilb_create_servergroup() ilb_destroy_servergroup() ilb_add_server_to_group() ilb_rem_server_from_group() ilb_srvID_to_address() ilb_address_to_srvID() ilb_walk_servergroups() ilb_walk_servers() ilb_create_hc() ilb_destroy_hc() ilb_get_hc_info() ilb_walk_hc() ilb_walk_hc_rule() ilb_enable_server() ilb_disable_server() ilb_show_nat() ilb_errstr() ilb_halt() ilb_reset_config() Details: ======== typedef struct { int32_t ia_af; /* AF_INET or AF_INET6 */ union { struct in_addr v4; /* network byte order */ struct in6_addr v6; /* network byte order */ } _au; #define ia_v4 _au.v4 #define ia_v6 _au.v6 } ilb_ip_addr_t; /* Supported load balancing algorithm type */ typedef enum { ILB_ALG_ROUNDROBIN = 1, ILB_ALG_HASH_IP, ILB_ALG_HASH_IP_SPORT, ILB_ALG_HASH_IP_VIP } ilb_algo_t; /* Supported load balancing method */ typedef enum { ILB_TOPO_DSR = 1, ILB_TOPO_NAT, ILB_TOPO_HALF_NAT } ilb_topo_t; /* producers of these statuses are libilb and ilbd functions */ typedef enum { ILB_STATUS_OK = 0, ILB_STATUS_HALT_OK, /* daemon has ack'd "halt"-no error string */ ILB_STATUS_INTERNAL, /* an error internal to the library */ ILB_STATUS_EINVAL, /* invalid argument(s) */ ILB_STATUS_ENOMEM, /* not enough memory for operation */ ILB_STATUS_ENOENT, /* no such/no more element(s) */ ILB_STATUS_SOCKET, /* socket related failure */ ILB_STATUS_CONNECT, /* connect related failure */ ILB_STATUS_READ, /* read related failure */ ILB_STATUS_WRITE, /* write related failure */ ILB_STATUS_TIMER, /* healthcheck timer error */ ILB_STATUS_INUSE, /* item in use, cannot delete */ ILB_STATUS_EEXIST, /* scf item exist */ ILB_STATUS_PERMIT, /* no scf permit */ ILB_STATUS_CALLBACK, /* scf callback error */ ILB_STATUS_EWOULDBLOCK, /* operation is blocked - no error string */ ILB_STATUS_INPROGRESS, /* operation already in progress */ ILB_STATUS_SEND, /* send related failure */ ILB_STATUS_GENERIC, /* generic failure - no error string */ ILB_STATUS_ENOHCINFO, /* missing healthcheck info */ ILB_STATUS_INVAL_HCTESTTYPE, /* invalid health check */ ILB_STATUS_INVAL_CMD, /* unknown command */ ILB_STATUS_DUP_RULE, /* rule name exists */ ILB_STATUS_ENORULE, /* rule does not exist */ ILB_STATUS_MISMATCHSG, /* addr family mismatch with sgroup */ ILB_STATUS_MISMATCHH, /* addr family mismatch with hosts/rule */ ILB_STATUS_INETNTOP, /* net_ntop() failed */ ILB_STATUS_SGUNAVAIL, /* cannot find sgroup in sggroup list */ ILB_STATUS_SGINUSE, /* server is un use, cannot remove */ ILB_STATUS_SGEXISTS, /* server exists */ ILB_STATUS_SGFULL, /* cannot add any more servers */ ILB_STATUS_SGEMPTY, /* sgroup is empty */ ILB_STATUS_NAMETOOLONG, /* a name is longer than allowed */ ILB_STATUS_CFGAUTH, /* config authoriz denied - no error string */ ILB_STATUS_BADSG, /* rules port rage size does not match */ /* that of the servers */ ILB_STATUS_INVAL_SRVR, /* server port is incompatible with */ /* rule port */ ILB_STATUS_BADPORT /* rules port value does not match */ /* server's */ } ilb_status_t; /* incomplete type to enforce the compiler's type checking */ typedef struct ilb_priv *ilb_handle_t; #define ILB_NAMESZ 20 #define ILB_SGNAME_SZ (ILB_NAMESZ - 5) typedef enum { ILB_HCP_ANY = 0, /* check any port in servergroup's port range */ ILB_HCP_ALL, /* check all ports on servergroup */ ILB_HCP_FIX /* check a fixed port */ } ilb_hcport_flag_t; typedef struct hc_srv { char hcs_ID[ILB_NAMESZ]; struct in6_addr hcs_IP; int32_t hcs_fail_cnt; /* number of tests failed in one run */ ilb_srv_hc_t hcs_status; int32_t hcs_rtt; /* in us, -1 indicates error */ char hcs_rule_name[ILB_NAMESZ]; /* rule that uses server */ time_t hcs_lasttime;/* last time test performed */ time_t hcs_nexttime; /* next time to test */ } ilb_hc_srv_t; /* data struct for user to pass info to lib */ typedef struct hc_info { int32_t hci_cmd; char hci_name[ILB_NAMESZ]; char hci_test[MAXPATHLEN]; int32_t hci_timeout; int32_t hci_count; int32_t hci_interval; boolean_t hci_def_ping; int32_t hci_flag; int32_t hci_srv_cnt; hc_srv_t hci_srv[HC_MAX_SVRS]; } ilb_hc_info_t; /* Struct to represent a NAT entry. */ typedef struct { uint32_t nat_proto; in6_addr_t nat_in_local; in6_addr_t nat_in_global; in6_addr_t nat_out_local; in6_addr_t nat_out_global; in_port_t nat_in_local_port; in_port_t nat_in_global_port; in_port_t nat_out_local_port; in_port_t nat_out_global_port; } ilb_nat_info_t; typedef struct rule_data { char r_name[ILB_NAMESZ]; /* name of this rule */ int32_t r_flags; /* opt: ILB_FLAGS_RULE_ENABLED etc. */ ilb_ip_addr_t r_vip; /* vip, required for rule creation */ uint16_t r_proto; /* protocol (tcp, udp) */ in_port_t r_minport; /* port this rule refers to */ in_port_t r_maxport; /* if != 0, defines port range */ ilb_algo_t r_algo; /* round-robin, hash-ip, etc. */ ilb_topo_t r_topo; /* dsr, NAT, etc */ ilb_ip_addr_t r_nat_src_start; /* required for NAT */ ilb_ip_addr_t r_nat_src_end; /* required for NAT */ ilb_ip_addr_t r_stickymask; /* netmask for persistence */ uint32_t r_conndrain; /* opt: time for conn. draining (s) */ uint32_t r_nat_timeout; /* opt: timeout for nat connections */ uint32_t r_sticky_timeout; /* opt: timeout for persistence */ in_port_t r_hcport; /* opt with HC */ char r_sgname[ILB_SGNAME_SZ]; /* this rule's server grp. */ char r_hcname[ILB_NAMESZ]; /* optional */ ilb_hcport_flag_t r_hcpflag; } ilb_rule_data_t; /* not all fields are valid in all calls where this is used */ typedef struct server_data { ilb_ip_addr_t sd_addr; /* a server's ip address */ in_port_t sd_minport; /* port information */ in_port_t sd_maxport; /* ... if != 0, defines a port range */ uint32_t sd_flags; /* enabled, dis- */ char sd_srvID[ILB_NAMESZ]; /* "name" for server */ /* assigned by system, not user */ } ilb_server_data_t; typedef struct sg_data { char sgd_name[ILB_SGNAME_SZ]; int32_t sgd_flags; int32_t sgd_srvcount; /* not used for SG creation */ } ilb_sg_data_t; typedef ilb_status_t (* sg_walkerfunc_t)(ilb_handle_t, ilb_sg_data_t *, void *); typedef ilb_status_t (* srv_walkerfunc_t)(ilb_handle_t, ilb_server_data_t *, const char *, void *); typedef ilb_status_t (* rule_walkerfunc_t)(ilb_handle_t, ilb_rule_data_t *, void *); typedef ilb_status_t (* hc_walkerfunc_t)(ilb_handle_t, ilb_hc_info_t *, void *); typedef ilb_status_t (* hc_rulewalkerfunc_t)(ilb_handle_t, ilb_hc_srv_t *, void *); Non-specific return codes: ILB_STATUS_OK success ILB_STATUS_INTERNAL internal error, talk to support ILB_STATUS_EINVAL in argument was invalid ILB_STATUS_ENOMEM out of memory ILB_STATUS_SOCKET the socket to the daemon went away (ie daemon is probably dead) ILB_STATUS_PERMIT permission to modify persistent config in SCF denied ILB_STATUS_INVAL_CMD unknown command ILB_STATUS_CFGAUTH insufficient authorization to alter config All functions except ilb_open() take an ilb_handle_t as 1st argument; this needs to be a valid handle as created by ilb_open(). Other arguments and specific return values are documented separately for each function. ================================================================= SETUP etc. ================================================================= /* * ilb_open() creates a handle which represents the connection * to the daemon. this handle must be used in all communication * with libilb. * if an error occurs when talking to the daemon, the channel is * closed; using the handle thereafter will result in an error, and * ilb_close() should be called on the handle. */ ilb_status_t ilb_open(ilb_handle_t *h) /* * close communication to ilbd, relinquish all associated resources */ void ilb_close(ilb_handle_t h) /* * ilb_halt() causes ilbd to remove all rules from the kernel and * then to terminate * this does not affect persistent configuration; ie upon restart, ilbd will * reconstruct all rules etc. to the state they were in before ilb_halt * was called. * Returns: ILB_STATUS_HALT_OK daemon accepted command and will terminate */ ilb_status_t ilb_halt(ilb_handle_t h) /* * ilb_reset_config() removes all health checks, servergroups and * rules from the running configuration - this includes persistent * configuration, which is also cleared. */ ilb_status_t ilb_reset_config(ilb_handle_t h) ================================================================= RULES ================================================================= /* * ilb_create_rule() creates a rule according to the data in the structure * pointed to by rd. See definitions above for possible values and optional * fields. * Arguments: rd ... pointer to rule_data_t structure * Returns: ILB_STATUS_DUP_RULE: rule already defined * ILB_STATUS_MISMATCHSG: address family mismatch with servergroup */ ilb_status_t ilb_create_rule(ilb_handle_t h, const rule_data_t *rd) /* * ilb_destroy_rule() destroys the named rule * Arguments: rulename ... name of the rule to destroy * Returns: ILB_STATUS_ENOENT rule does not exist */ ilb_status_t ilb_destroy_rule(ilb_handle_t h, char *rulename) /* * ilb_disable_rule() disables the named rule * Arguments: rulename ... name of the rule to disable * Returns: ILB_STATUS_ENOENT rule does not exist */ ilb_status_t ilb_disable_rule(ilb_handle_t h, char *rulename) /* * ilb_enable_rule() enables the named rule * Arguments: rulename ... name of the rule to enable * Returns: ILB_STATUS_ENOENT rule does not exist */ ilb_status_t ilb_enable_rule(ilb_handle_t h, char *rulename) /* * the library will fill in the arguments for a given non-null * rulename, or for every rule if rulename is NULL. * "priv" will passed into fn as "arg" * Arguments: fn ... function to be called once for every rule * rulename ... name of the rule to "walk", or NULL for all * priv ... private argument to be passed into fn unchanged */ ilb_status_t ilb_walk_rules(ilb_handle_t h, rule_walkerfunc_t fn, char *rulename, void *priv) ================================================================= SERVER GROUPS ================================================================= /* * ilb_create_servergroup() creates a servergroup * Arguments: sgname ... name to be given to servergroup * Returns: ILB_STATUS_SGEXISTS: servergroup already exists */ ilb_status_t ilb_create_servergroup(ilb_handle_t h, const char *sgname) /* * ilb_destroy_servergroup() destroys a servergroup; this servergroup * cannot be attached to any rule at that time. * Arguments: sgname ... name of the servergroup to be destroyed * Returns ILB_STATUS_ENOENT the named servergroups does * not exist */ ilb_status_t ilb_destroy_servergroup(ilb_handle_t h, const char *sgname) /* * when a server is added to a server group, the system assigns * a serverID to this server. any value in sd_srvID will be ignored. * Arguments: sgname ... servergroup to add server to * srv ... pointer to structure describing server * Returns: ILB_STATUS_ENOENT: servergroup does not exist * ILB_STATUS_EINVAL: invalid data in *srv */ ilb_status_t ilb_add_server_to_group(ilb_handle_t h, const char *sgname, ilb_server_data_t *srv) /* * ilb_rem_server_from_group() removes the server specified by * srv->sd_srvID or srv->sd_address (if no sd_srvID is given) from * the servergroup names in sgname. * * Arguments: sgname ... servergroup to remove server from * srv ... pointer to structure describing server (see above) * applies to ilb_enable_server() and ilb_disable_server() also */ ilb_status_t ilb_rem_server_from_group(ilb_handle_t h, const char *sgname, ilb_server_data_t *srv) /* * enable a server indicated by *srv * the only legal value for reserved is NULL for this implementation; * we foresee that this will contain as-yet-unspecified options * or selectors in the future. */ ilb_status_t ilb_enable_server(ilb_handle_t h, ilb_server_data_t *srv, void *reserved) /* * disable a server indicated by *srv - see comments for * ilb_enable_server() for fields and Arguments */ ilb_status_t ilb_disable_server(ilb_handle_t h, ilb_server_data_t *srv, void *reserved) /* * fill in the ip address for the srvID specified in *srv and * belonging to servergroup sgname. * Arguments: srv ... pointer to a struct containing a sd_srvID. the * sd_addr field will be filled in * sgname ... name of servergroup this server belongs to * Returns: ILB_STATUS_ENOENT: server with serverID was not found * ILB_STATUS_SGUNAVAIL: servergroup was not found */ ilb_status_t ilb_srvID_to_address(ilb_handle_t h, ilb_server_data_t *srv, const char *sgname) /* * fill in the serverID for the IP address specified in *srv in the * servergroup sgname. * Arguments: srv ... pointer to a struct containing a valid sd_addr. the * sd_srvID field will be filled in * sgname ... name of servergroup this server belongs to * Returns: ILB_STATUS_ENOENT: server containing sd_addr was not found * ILB_STATUS_SGUNAVAIL: servergroup was not found */ ilb_status_t ilb_address_to_srvID(ilb_handle_t h, ilb_server_data_t *srv, const char *sgname) /* * call function fn once for every servergroup, or just for the * one named. * Arguments: fn ... function to be called by the walker once for * every servergroup * sgname ... optional; if non-NULL, "walk" just the named * servergroup (if NULL, walk all servergroups). * priv ... private argument passed in unchanged to fn * Returns: ILB_STATUS_ENOENT: servergroup sgname not found */ ilb_status_t ilb_walk_servergroups(ilb_handle_t h, sg_walkerfunc_t fn, const char *sgname, void *priv) /* * call function fn for every server in server group sgname, or * for all servergroups if no name is given * Arguments: fn ... function to be called by the walker once for * every server * sgname ... optional; if non-NULL, "walk" just the named * servergroup (if NULL, walk all servergroups). * priv ... private argument passed in unchanged to fn * Returns: ILB_STATUS_ENOENT: servergroup sgname not found */ ilb_status_t ilb_walk_servers(ilb_handle_t h, srv_walkerfunc_t fn, const char *sgname, void *priv) ================================================================= HEALTH CHECK ================================================================= /* * ilb_create_hc() creates a health check according to *hc * Arguments: hc ... pointer to a hc_info_t describing the HC */ ilb_status_t ilb_create_hc(ilb_handle_t h, hc_info_t *hc) /* * ilb_destroy_hc will fail if there is still a rule defined that * refers to this HC * Arguments: hcname ... name of an existing HC * Returns: ILB_STATUS_ENOENT: specified HC does not exist * ILB_STATUS_EEXIST: there are still rules associated * with this HC */ ilb_status_t ilb_destroy_hc(ilb_handle_t h, const char *hcname) /* * ilb_get_hc_info() fills in HC info * Arguments: hcname ... name of the HC * hcp ... indirect pointer to the data to be filled in * Returns: ILB_STATUS_ENOENT: HC not found */ ilb_status_t ilb_get_hc_info(ilb_handle_t h, const char *hcname, ilb_hc_info_t **hcp) /* * walk HCs and call fn for every HC defined * Arguments: fn ... function to call for every HC * arg ... argument to pass unchanged to fn */ ilb_status_t ilb_walk_hc(ilb_handle_t h, hc_walkerfunc_t fn, void *arg) /* * walk through all rules that one hc associated and for each rule, return * its back-end server's health check status */ ilb_status_t ilb_walk_hc_rule(ilb_handle_t h, const char *rulename, hc_rulewalkerfunc_t fn, void *arg) ================================================================= SHOW NAT ================================================================= ilb_status_t ilb_show_nat(ilb_handle_t h, ilb_nat_info_t info[], size_t *num, boolean_t *end); ================================================================= MISC ================================================================= /* * returns a pointer to a string for the error code passed in * Arguments: rc ... error code from library * Returns: string containing text corresponding to rc */ const char * ilb_errstr(ilb_status_t rc) ************************************************** Appendix D: Redundancy scenarios that ILB Phase 1 will be able to support ************************************************** DSR Topology =============== Clients in the Internet | | ---------- | ROUTER | ----------- | 192.168.6.1 | ===================================================== 192.168.6.0/24 | | | | | |VIPS for virtual services| | eth0 192.168.6.3| |eth0 192.168.6.2 | | --------- --------- | | | LB1 | | LB2 | | | |Primary| |Standby | | | | | | | | | -------- ---------- | | |eth0 |eth1 | | | | | | | | | | | | | | | | | | --------- --------------- | | |SWITCH 1|------|SWITCH 2 | | | | | | | | | | | | | | | --------- --------------- | | | \ / | | | ================================ 10.0.0.0/24 | | | | | | Server1 Server2 -------------- | | | ----------------------------------------- Server IP address: 10.0.0.x/24 Default router on servers = 192.168.6.1 All VIPs on LBs are configured on interfaces facing subnet 192.168.6.0/24 . LB1 runs a VRRP instance per VIP. NAT Topology ============== Clients in the Internet | | ---------- | ROUTER | ----------- | 192.168.6.1 | ================================== 192.168.6.0/24 | | | | VIPs for virtual services eth0 192.168.6.3| |eth0 192.168.6.2 ------- --------- | LB1 | | LB2 | |Master| |Backup | | | | | ------- -------- |eth1 |eth1 | | | | | | Floating default gateway 10.0.0.1 | | --------- --------------- |SWITCH 1|------|SWITCH 2 | | | | | | | | | --------- --------------- | \ / | ================================ 10.0.0.0/24 | | Server1 Server2 Server IP address: 10.0.0.x/24 Default router on servers = 10.0.0.1 All VIPs on LBs are configured on interfaces facing subnet 192.168.6.0/24 . LB1 runs a VRRP instance per VIP and the floating default gateway. Failure scenario 1 : LB1 is dead Solution: LB2 will detect the failure and take over as the primary for all the VIPs It will also take over the 10.0.0.1 address as the router for the servers Failure scenario 2: LB1:eth0 LB1:eth1 are down Solution: LB1 continues to think its the Primary and sends VRRP advertisements, which never reaches LB2. LB2 becomes Primary. so now LB1 and LB2 are *both* primary load balancers and this is fine as nothing from LB1 will reach the servers. When the links of LB1 are back up, LB2 will receive the advertisements and will relinquish its position as Primary and become a Standby again. **************************************************** Appendix E Load balancer with Packet Filtering deployment scenarios **************************************************** CASE 1 ILB WITH PACKET FILTERING Disallow any packets from coming into or leaving LB EXCEPT for the following o tcp packet for port 80 o ssh to LB ++++++++++++++++++++++++++++++++++++ 192.129.84.0/24 58.0.0.0/24 public subnet private subnet |-------------R1------------------| | ------ | Client1 --- |------- e1000g0| LB |e1000g1----|--S1(58.0.0.100)defrtr= R1 Client2 --- | (VIP) | | |- S2(58.0.0.101)defrtr=R1 ------ | VIP = 129.148.81.24 Packet Filter rules: block in all pass in quick on e1000g1 proto tcp from any to any port = 22 pass out quick on e1000g1 proto tcp from any port = 22 to any pass in quick on e1000g0 proto tcp from any to any port = 80 pass out quick on e1000g0 proto tcp from any port = 80 to any pass out quick on e1000g1 proto tcp from any to any port = 80 pass in quick on e1000g1 proto tcp from any port = 80 to any LB rule: ilbadm create-servergroup -s servers=58.0.0.100,58.0.0.101 fnatsg ilbadm create-rule -e -i vip=129.148.81.24,proto=tcp,port=80 \ -m lbalg=hash-IP,type=full-NAT -o servergroup=fnatsg fnatrule CASE 2 ILB WITH STATEFUL PACKET FILTERING ++++++++++++++++++++++++++++++++++++ 192.129.84.0/24 58.0.0.0/24 public subnet private subnet |-------------R1------------------| | | | ------ | Client1 --- |------- e1000g0| LB |e1000g1----|--S1(58.0.0.100) def rtr= R1 Client2 --- | (VIP) | | |--S2(58.0.0.101) def rtr=R1 | ------ | | | VIP=129.148.81.24 IP Filter stateful filtering rule: block in all pass in on e1000g0 proto tcp from any to any port = 80 flags S keep state block out all pass out on e1000g1 proto tcp from any to any port = 80 flags S keep state LB rule: ilbadm create-servergroup -s servers=58.0.0.100,58.0.0.101 fnatsg ilbadm create-rule -e -i vip=129.148.81.24,proto=tcp,port=80 \ -m lbalg=hash-IP,type=full-NAT -o servergroup=fnatsg fnatrule CASE 3 ILB WITH PACKET FILTERING Communication between servers on 2 different networks should not happen through LB ++++++++++++++++++++++++++++++++++++ ----------------------------------- 192.129.84.0/24 |nic3 | | === | |----------------------nic1 |R1|----------- | | === | | | | | C1-| ====== 58.0.1.0/24) | | |----e1000g0 | LB | e1000g1-------------------- | C2-| (VIP1,VIP2)| | | |default router = R1 | ======= S1(58.0.1.100) S2(58.0.1.101) | |e1000g2 | | 58.0.2.0/24 | -------------------------------------------------------- |nge0 | S3 (58.0.2.100) S4(58.0.2.101) default router= R1 VIP1= 129.148.81.24 VIP1= 129.148.81.25 Packet Filter rule on LB : block in on e1000g1 from any to 58.0.2.0/24 block in on e1000g2 from any to 58.0.1.0/24 LB rules: ilbadm create-servergroup -s servers=58.0.0.100,58.0.0.101 fnatsg ilbadm create-rule -e -i vip=129.148.81.24,proto=tcp,port=80 \ -m lbalg=hash-IP,type=full-NAT -o servergroup=fnatsg fnatrule ilbadm create-servergroup -s servers=58.0.2.100,58.0.2.101 dsrsg ilbadm create-rule -e -i vip=129.148.81.25,proto=tcp,port=80 \ -m lbalg=hash-IP,type=DSR -o servergroup=dsrsg dsrrule CASE 4 ILB With IP FILTER NAT FOR REDIRECTION Communications between: S1<->S2 S1<->S3 must happen via LB using VIP2 and IP Filter NAT for redirection rules. +++++++++++++++++++++++++++++++++++++++++++++++++ 192.129.84.0/24 (public subnet) | | | | C1-| (VIP1) ====== (VIP2) (192.168.1.0/24) |----e1000g0 | LB | e1000g1------------------ C2-| | | | ======= S1(192.168.1.11) |e1000g2 | 172.16.1.0/24 --------------------------- |nge0 |nge0 S2 (172.16.1.22) S3(172.16.1.23) Servers S1,S2,S3, all use R1 as default router VIP1=129.148.81.24 VIP1= 192.168.1.3 IPFilter rules: rdr e1000g1 192.168.1.3 port 80 -> 172.16.1.22 port 80 tcp round-robin rdr e1000g1 192.168.1.3 port 80 -> 172.16.1.23 port 80 tcp round-robin LB rules: ilbadm create-servergroup -s servers=172.16.1.22,172.16.1.23,192.168.0.11 \ fnatsg ilbadm create-rule -e -i vip=129.148.81.24,proto=tcp,port=80 \ -m lbalg=hash-IP,type=full-NAT -o servergroup=fnatsg fnatrule *************************************** Appendix F Load balancer topologies *************************************** Single legged topology -------------- |Load Balancer| -------------- | | ------- | -------- Internet----- |Router| ------ Local Network ---------- |Server1| ------- -------- --------- ------------ |Server2| -------- Dual legged topology -------- -------------- Internet -------| Router|----- Local Network----|Load Balancer| --------- --------------- | Target Network | | -------- --------- |Server1| |Server2| --------- ---------