INTEGRATED LOAD BALANCER DESIGN DOCUMENT [Rev 1.1] Authors: Sangeeta Misra Kacheong Poon Michael Schuster 1. Overview ---------------- This document describes the functional components and the overall design of ILB project(PSARC 2008/575). The project will deliver the basic features needed to use Solaris on a x86/SPARC platform as a L3/L4 load balancer. The project will deliver the following features: o Stateless DSR(Direct Server Return) and NAT operation modes offering the following load balancing algorithms: round-robin, src IP hash, hash, hash IPv4 and IPv6 support will be provided for both operations. o A CLI and a configuration API to configure the various features as well as view statistics and configuration details. o Simple server monitoring features o High availability between load balancers in a active-standby configuration mode via VRRP protocol(RFC 3768). Note that VRRP will be delivered in Solaris as a seperate but parallel project[1]. The project includes kernel and userland components. Thus ILB will be delivered as separate packages from the core stack with their following package names: SUNWilbr ILB kernel component SUNWilb components delivered in /usr which are: o ilbadm o libilb o ilbd 2. Terms used in this document ------------------------------- Stateless Direct Server return - Direct Server Return mode (DSR) refers to using the load balancer to only load balance incoming requests to the back-end servers and letting the return traffic from the servers to the clients bypass the load balancer. With stateless DSR, the load balancer will not keep any state information of the packets processed (load balanced), except for simple statistics. Server group - A server group comprises of a set of back-end servers. If the ILB user wants to load balance HTTP requests, he/she will configure the load balancer with a server group consisting of several servers. The load balancer will balance the HTTP traffic across this set of servers. Virtual Service - A virtual service is what the world sees as VIP:port (eg www.foo.com:80). Although the service is being handled by a server group consisting of several servers, the server group appears to the clients of the Virtual service as a single IP address:port. Note that a single server can be included in multiple server group and thus may serve multiple virtual services. VIP - Virtual IP address (VIP) is the IP address for the virtual service. Load balancing algorithm - The algorithm that the load balancer uses to select back-end servers from a server group for incoming packets. Load balancing Rule - For the purposes of this document a load balancing rule is defined by the following parameters: o IP version: the IP version (IPv4 or IPv6) of a packet o virtual IP address (VIP) o transport protocol: TCP or UDP o port number (or a port range) o load balancing algorithm o type of load balancing operation (DSR or NAT) o a server group and optional health checks that are to be executed for each server in the server group. o rule name The load balancer uses the {VIP, transport protocol, port number} values to determine if an incoming packet matches a rule. If there is a match then the load balancer uses the specified load balancing algorithm to select a server from the server group. 3 Load balancer operation modes ------------------------------- The ILB project will provide in-kernel implementations of IPV4 and IPv6 supported Direct Server Return(DSR) and NAT(half and full) based operation modes. Direct Server Return mode (DSR) refers to using the load balancer(LB) to only load balance incoming requests to the back-end servers and letting the return traffic from the servers to the clients bypass the load balancer. NAT-based load balancing involves rewriting of header information, and handles both the request and the response traffic. Phase 1 will support single legged and dual legged topologies (see Appendix D). As part of proof of concept work we implemented in-kernel implementations of DSR and half-NAT and compared the performance of our half NAT load balancer implementation with that of IP Filter(to ensure that ours does not perform worse than that of IP Filter's). The comparative performance results are listed in Appendix A. After careful review of both implementations, we decided to use the standalone NAT load balancer version because it met the following criteria better than IP Filter's implementation: - Lightweight code, containing only the NAT-based load balancing feature, that can be extended easily to add load balancing algorithms as requested by customer - Fits well with rest of ILB code so that the load balancing algorithms can be shared by DSR and NAT - Minimizes conflict when system is running NAT based ILB and IP Filter NAT at the same time. It is important to note here that unlike IP Filter NAT, the standalone version is not a full-blown NAT implementation; instead it is strictly limited to just load balancing functionality. 4. Command-line Interface ---------------------------------- The core functionality of the load balancer administration will be implemented in a library(libilb) for consumption by the CLI(ilbadm) and 3rd party applications. The location of the CLI will be /usr/sbin/lbadm. The location of the API will be /usr/lib/libilb. The CLI will include commands to configure load balancing rules, server groups and health checks. In addition to this it will also include various commands to display statistics as well as configuration details. The user will require privilege to invoke configuration commands. The view commands can be invoked by regular user. Configuration commands: o create and destroy load balancing rules o add and remove servers from a server group View commands: o view configured load balancing rules o view packet forwarding statistics o view nat connection table o view health check results A detailed list of commands are provided in Appendix B 5. Server monitoring details ---------------------------------- ILB project will offer optional server monitoring feature and will provide the following types of health checks: o ping o TCP o UDP probes o user supplied tests that would be run as health checks ( the test can be a binary or a shell script) The health checks are specified for the associated server group, when creating a load balancing rule. One can only configure one health check per load balancing rule. The following user configurable parameters apply health check configuration: o hc-test - type of health check o hc-timeout - timeout when a health check is considered to have failed if it does not complete o hc-interval- interval between consecutive health checks. Note that the implementation will randomize the interval between 0.5[hc-interval] - 1.5[hc-interval] avoid synchronization[6] o hc-count - number of consecutive failed health checks before the server is considered to be down. 6. High availability and redundancy capability ------------------------------------------------ ILB will provide optional HA capability for active-standby redundancy configuration. The active-standby configuration consists of a pair of load balancers, yet only one of them is active (this is the primary load balancer) while the other stays in standby mode. Should the primary fail, the standby will take over the primary's job. The VRRP protocol will be used for the selection of primary load balancer[5]. Note that ILB will only provide redundancy for machine failures, and will not handle switch failures. We will use other existing mechanisms like link aggregation to handle switch failures. In order to make load balancer failover transparent to client applications, the primary load balancer needs to synchronize its state (e.g connection information) with the standby load balancer. This is needed so that when the primary fails and the standby takes over, it will have the state of most connections, so that almost all connections can continue to access the virtual service through the standby. ILB project will not deliver this synchronization capability. Note that HA without synchronization is still valuable as upon the primary's failure, it allow the user to have service to reconnect to. To set up HA capability, the user will have to manually configure both the primary and the standby via VRRP CLI and use the export subcommand of ILB CLI(see Appendix B) to acquire an editable copy of the primary's persistent configuration, modify it as necessary and copy it over to the standby. 7 Other capabilies ------------------- Other capabilities include the following: 1. Ability for clients to ping VIP address - The load balancer needs to be able to respond to ICMP echo requests to VIPs from clients. Both DSR and NAT will provide support for this feature. 2. Ability to add and remove servers from a server group without interrupting service - This capability allows one to dynamically add and remove servers from a server group of an active rule, without interruption of existing connections established to back-end servers. NAT will provide support for this feature. 3. Session persistence - For many applications, it is important that a series of connections from the same client are sent to the same back-end server. Ideally, the addition or removal of a back-end server shouldn't interfere with established persistent sessions. ILB will provide the admin the capability to configure the following session persistence (also called "stickiness") mechanisms for NAT-based load balancing operation mode: o src-IP sticky (Layer 3 stickiness): stick a client to a server based on the client IP address. o src-IP,dstport sticky (Layer4 stickiness):stick a client to a server based on both the client IP address and the server destination port number 4. Connection draining - ILB will provide support for this feature only for servers of NAT-based virtual services that have session persistence enabled. This feature allows the administrator to specify a back-end server for draining. No new connections will be sent to this server, except for connections from clients with established session persistence to that server. As the session persistence timers expire, all clients will gradually be migrated off the selected server, which can then be taken down for maintenance. Once the server is ready to handle requests, the admin will turn off the feature for the server so that the load balancer can forward new connections to it. This allows administrators to take down servers for maintenance without disrupting active connections/sessions. 5. Load balancing all ports - ILB will provide the ability to load balance all ports on a given IP address across the set of servers, without having to set up explicit rules for each port. This feature will be available for NAT and DSR operation modes. 6. Independent ports for virtual services in same pool - For NAT, it should be possible to specify different destination ports for different servers in the pool. 7. Load balance simple port range - This capability allows one to load balance a range of ports on the VIP to the given server group. It's sometimes convenient to be able to conserve IP addresses by load balancing different port ranges on the same VIP to different sets of back-ends. Both DSR and NAT will provide support for this feature. In addtion, when session persistence is enabled for NAT based load balancing, requests from the same client IP for different ports in the range should be sent to the same back-end server. 8. Port range shifting and collapsing - These features will be provided by NAT operation mode. Port range means the following: Rule: VIP(n:N) -> {IP1(n1:N1), IP2(n2:N2), ... } When the load balancer gets a packet where n <= m <= N, it will load balance the packet to IP1, IP2 etc, and re-write the packet to IP1 as Port range collapsing means the following. Rule: VIP(n:N) -> {IP1:n1, IP2:n2, ... } When load balancer gets a packet where n <= m <= N, it will load balance the packet to IP1, IP2 etc and re-write the packet (suppose half NAT) to IP1 as 8. Architecture ---------------- The following diagram shows the major components of ILB: --------------------- |ilbadm CLI interface| --------------------- | | V ---------- AF_UNIX sockets ------- | libilb |<------------------>|ilbd | ---------- ------- ^ | |ioctls | V ------------------------------------ | Kernel ILB Engine | ------------------------------------ The major components are: ilbadm - This is the CLI of ILB. An admin will use this interface to configure load balancing and optional health checks as well view statistics libilb - This is the configuration library. ilbd - The ilb daemon has the following tasks: o manage persistent configuration o serialize access to the kernel ILB module by processing configuration and statistics display requests from libilb and feeding it to kernel ILB for execution. o perform health checks (built-in health checks and as well as run user-supplied test scripts as health checks) and notify the kernel ILB module for server health so that the load distribution is adjusted properly. 9. The specifics of ilbd daemon --------------------------------- The ilbd source code will reside in /usr/src/cmd/cmd-inet/usr.lib directory. 9.1 IPC details and privileges for ilbd daemon We will use AF_UNIX socket (socket type of SOCK_SEQPACKET) for IPC between libilb and ilbd as both processes will run on the same machine. A subset of ilbadm commands will require privileges (specifically the configuration commands) while others (the statistics and configuration display commands) would not. The /var/run directory will hold the AF_UNIX rendevous files. We propose that the project implement "ilbadm" uid. The ilbd daemon will be run by the "ilbadm" user with PRIV_SYS_IP_CONFIG privilege and will use ioctls to communicate with the kernel. The kernel should check the ioctl credential to make sure its PRIV_SYS_IP_CONFIG before servicing it. Since the persistent config files can only be modified by the daemon, the files will be owned by user "ilbadm" and will belong in /etc/ilbadm directory. The ILB project will audit administration using the auditing interfaces that are defined by PSARC 200/517 9.2 ilbd daemon internals The core of ilbd daemon will be a single-threaded event loop using event completion framework; it will receive request from libilb, handle timeouts, perform health checks, and populate the kernel state. We choose to use event_port framework[2,3] over poll/select because of ease of implementation, some of which are these: o Unlike with poll() one does not need to walk the entire set of file descriptors to find out which one(s) had activity. Walking the list is an O(N) activity which does not scale well as N gets large. o The necessity to handle timers via signal goes away. You can simply associate a timer with a event port. To do health check, the daemon will create a timer for every health check (this means if there are 100 servers and the load balancer is configured to run 3 health checks per server, there will be a total of 300 timers created). Each of these timers will be associated with the event port. When a timer goes off, the daemon will initiate a pipe to a separate process (using popen) to do the specific healthcheck. All health checks will be implemented as external methods (binary or script) that is to be executed by ilbd as a seperate process. The following arguments will be passed to external methods: $1 VIP (literal IPv4 or IPv6 address) $2 Server IP (literal IPv4 or IPv6 address) $3 Protocol (UDP, TCP as a string) $4 Numeric port $5 maximum time (in seconds) the method should wait before returning failure. If the method runs for longer, it may be killed, and the test considered failed. Return values: Process writes RTT on stdout, or 0, if it does not calculate it. A value of 255 signifies failure. To keep things simple, all the health checks that ILB provides will be run with a specific set of privileges (one of them being PRIV_NET_ICMPACCESS to allow ICMP echo health check). By default, the user-supplied health checks will also run with the same set of privileges. If the administrator has some user-supplied scripts that require a larger privilege set, he/she will have to run it with setuid explicitly. Each health check will have a timeout, such that if the health check process is hung it will be killed after the timeout interval and the daemon will notify the kernel ILB engine of the server's unresponsiveness, so that load distribution can be appropriately adjusted. If on the other hand the health check is sucessful the timeout timer is cancelled. Here is the pseudo code: port_create() associate periodic timers for each health check with port associate socket to obtain requests from libilb forever() { port_get() switch (event type) { case data on socket: apply config change to kernel update internal state re-associate socket with event case periodic timer for HC: FILEp = popen(HC test program) create timeout timer for this test; associate timeout timer with port; port_associate(fileno(FILEp); case return value for HC test record RTT cancel associated timeout timer case timeout kill the HC process update kernel with "serverX for Loadbalancing rule A is dead" } } 9.3 Error handling and monitoring Errors will be reported to syslog. In addition to that, the "monitor" option of ilbadm command can be used to monitor the ilbd daemon's execution of events and communication with kernel. Note that one does not require priviledged access to run the 'monitor" option. By default the output of "ilbadm monitor" command will be appended to a file. The verbosity of the output can be dialed up with -d option (useful for debugging purpose). 9.4 Signals handling The ilbd daemon will handle SIGALRM and SIGTERM via event ports. 10 ILB kernel components ---------------------------- The ILB code resides in the IP module. It provides two load balancing mechanisms, stateless DSR and NAT (half and full) for UDP and TCP traffic. User land application can open a socket() and issue ioctl() on this socket to communicate with the ILB code. The ILB code intercepts incoming packets right before IP decides if a packet is destined locally or to be forwarded. It is after the "physical in" and before the "forwarding" Packet Filtering Hooks (PSARC 2005/33) processing. If there is a load balancing rule, the ILB code will be invoked to check if the packet needs to be load balanced. Note that the placement of the interception implies that the ILB code cannot do load balancing for local traffic. We have chosen this design instead of extending IP FIlter hooks to ensure that the order of ILB processing and that of IP Filter is correct. Furthermore, should we in the future need an ILB hook on the transmit side, that hook wouldn't belong where pfhooks sits on the transmit side; we'd need the transmit hook to be before IRE lookup and fragmentation, instead of at the bottom of the IP output code(where pfhook transmit is). If an incoming packet matches a load balancing rule, the rule's algorithm will be used to select a back end server. If the rule requires the use of NAT, the header of the packet will be re-written with the NAT info. After the server selection and header re-write, the normal IP incoming packet process will continue using the selected server's IP address as the destination. If an incoming packet is a fragment destined to a VIP of any load balancing rules, ILB will drop it. This is a potential RFE for future phases of this project. 10.1 ICMP processing The ILB code has some special handling for incoming ICMP packets destined to one of the load balancing rules' VIP. If the ICMP packet is an echo request, the ILB code will reply this request on behalf of the back end servers. Note that a VIP can be used in more than one rule. And an ICMP echo message does not include enough information for the ILB to decide which rule to use to handle it. So the ILB code needs to handle this itself. If the ICMP message is "destination unreachable: fragmentation needed," the ILB code checks the payload of the message and finds out if the message should be forwarded to a back end server. If the ICMP message needs to be forwarded, the ILB code will re-write the ICMP IP header and the header inside the ICMP message appropriately. This forwarding is possible for rules using NAT or rules using DSR with persistence enabled. ILB will drop all other ICMP messages destined to a VIP. 10.2. ioctl() interface The ioctl() interface is Private to this project. Details TBD. 10.3. Interaction with other IP technologies IPMP - The position of the interception point ensures that ILB works well with IPMP. IPsec - ILB cannot load balance IPsec encrypted traffic since ILB cannot read the transport header. Packet Filter Hooks - ILB does not interfere with any registered hook. For example,it should work well with a firewall module using PF hooks. But since ILB may modify the header information, it can have unwanted interaction with modules which also modify header information. Note that this interaction is deterministic since the position of ILB interception is fixed and the possible modification of a packet can be derived from the ILB rules. A system administrator just needs to be careful in using ILB with this kind PF hook module, such as IP Filter NAT. 11. Reference --------------- 1. http://www.opensolaris.org/os/project/vrrp/vrrp_design.pdf 2. http://developers.sun.com/solaris/articles/event_completion.html 3. Man pages port_get(3C), port_associate(3C), port_create(3C) 4. Man page privileges(5) 5. http://www.ietf.org/rfc/rfc3768.txt 6. ftp://ftp.ee.lbl.gov/papers/sync_94.ps.Z Appendix A: POC performance results ---------------------------------------- Test Topology ------------ | | -------------- | L3/L4 LB | | | | | ----------------- | | subnet 1 | x4200m2 | | | |Ixia mimicing|--------------|e1000g0 | |Ixia mimicing 4 | |238 clients | | e1000g1 |---------- |back-end servers| | | | | subnet 2 ------------------ --------------- ------------ Hardware --------- DUT x4200 with e1000g nics Ixia details: Ixia 400T 8 port chassis with IxLoad version 3.30.42.143 Traffic: HTTP1.0/1.1 reqyuests Page size for concurrent connections and CPS: 1 byte HTML file Page size for throughput: 64Kbyte HTML file Performance Results ===================== # of CPUS Mode CPS Concurrent Tput Connections (Mbps) ======================================================== 4 IPFNAT-RR 34,000 450,000 932 4 ILBNAT-RR 41,500 850,000 920 4 DSR(srcIP -hash) - - 2296 Appendix B: ILB Commands -------------------------- NAME ilbadm - manipulate load balancing rules SYNOPSIS ilbadm create-rule [-e] \ -i vip=value,port=value[,protocol=value][,ipversion=value] \ [,interface=ifname] \ -m lbalg=value,type=value \ [-h hc-name=value] -o servergroup=value name ilbadm list-rules [-a] [-p|-f] [-o key[,key ...]] [name ...] ilbadm destroy-rule -a | name ... ilbadm enable-rules [-t] [name ... ] ilbadm disable-rules [-t] [name ... ] ilbadm show-statistics [-thaAd] [-r rulename] [interval [count]] ilbadm show-nat [opts ...] ilbadm create-servergroup [-s server=hostspec[:portspec...]] groupname ilbadm destroy-servergroup groupname ilbadm list-servergroup [-s|-f|[-p] -o field[,field]] [name] ilbadm enable-server [-t] -s server=value[,value] ilbadm disable-server [-t] -s server=value[,value] ilbadm add-server -s server=value[,value ... ] name ilbadm remove-server -s server=value[,value ... ] name ilbadm create-healthcheck hc-test=value[,hc-timeout=value] \ [,hc-count=value][,hc-interval=value][hc-port=value] \ hcname ilbadm destroy-healthcheck hcname ilbadm export-rules [filename] ilbadm import-rules [filename] ilbadm export-servergroups [filename] ilbadm import-servergroups [filename] ilbadm monitor filename DESCRIPTION ilbadm manipulates or displays information about ILB rules using the subcommands outlined below. Rulenames are case insensitive, but case is preserved as it is entered. Names are limited to 80 characters. Parsable output: all parsable output requires that the fields to be printed be given with the -o option. Fields will be printed in the same order they are encountered on the commandline, seperated by ':' characters (if there's more than one value). If this character occurrs in the printed string itself, it will be preceeded by a '\'; the same is done for the '\' character itself. No headers will be printed for parsable output. Synopses below only contains short options, long options are shown in the explanation Global Options: -t causes the modicifation (rule creation ...) to be temporary: this modification will not be reflected in persistent storage, ie it will not persist across reboots/restarts of the daemon. Subcommands: create-rule [-e] -i -m \ -o [-h ] name creates a rule "name" with the given characteristics. and are both specified as a set of "key=value" pairs. The following keys and values are valid: -i introduces the matching criteria for incoming packets: vip (virtual) destination ip address port[-port] port number or name (eg, "telnet", "dns") a port can be specified by port number or symbolic name (as in /etc/services) port ranges are also supported (numeric only) protocol the protcol: "TCP" (default), "UDP" (see /etc/protocols) ipversion ip version: "IPv4" (default), "IPv6" or "both" for an unspecified vip. interface optional, for the case when the interface to "watch" for VIP cannot be derived by the system -m the keys describing how to handle a packet: lbalg "round-robin" (default), "hash-IP", "hash-IP-port", "hash-IP-VIP" type aka topology: "DSR", "half-NAT", "NAT" -o specifies to which destinations a packets matching the criteria specified with -i will be distributed among: servergroup specify a single server group as target. The server group must already have been created. If -t is not used, the servergroup must also be created without -t. -h hc-name specifies the name of a pre-defined healthcheck method OPTIONS: -e create rule enabled (default: disabled) If "name" already exists, the command will fail. The command will also fail if a rule exists that matches the given vip. destroy-rule -a | name ... remove all information pertaining to rule "name". If "name" doesn't exist, command will fail. -a destroy all rules. (name will be ignored) enable-rules [-t] [name ... ] enables a named rule (or all, if no names are given). Enabling rules that are already enabled is a noop. disable-rules [-t] [name ... ] disables a named rule (or all, if no names are given). Disabling rules that are already disabled is a noop. show-statistics [-thaAd] [-r rulename] [interval [count]] shows statistics (see examples below) show-nat [[-p] -o field[,field ...]] [count] displays NAT information (options, format TBD) displays "count" lines of type "value", or a given default, if no count is given (currently 20). Specifying 0 for count means "all". Specifying an offset will cause the display to start at the specified position in the list. This offset should be less than the current number of elements of type value, or nothing will be printed. No assumptions should be made about the relative positions of elements in consecutive runs of this command, ie executing "show-nat 10" twice is not guaranteed to show the same 10 items twice, esp. on a busy system. -o specifies which fields to print: legal values: in_local, in_global, out_local, out_global -p print fields in a parsable manner (requires -o) list-rules [-f] [-d|-e] [[-p] -o field[,...]] [name ...] prints characteristics of the specified rules, or all, if none is specified. -o lists fields to be printed. -p print parsable output in the format explained above. requires -o -f prints a full list. -e print only enabled rules (default: all) -d print only disabled rules -s, -p, and -f are mutually exclusive for an example of the output, see examples, below export-rules [filename] exports the complete set of rules in a way that can be re-imported using import-rules Format TBD import-rules [filename] reads rulesets from filename (or stdin) and applies them. The format (TBD) used will be the one created by export-rules. NOTE: existing rules are not destroyed first, so if a "clean slate" is required, rules need to be destroyed first. create-servergroup [-s server=hostspec[:portspec...]] \ [-i interface=name|proxy-src=src] groupname creates a server group. additional servers can later be added using the "add-server" subcommand. an optional server-facing interface can also be specified if desired. Server groups are the only entity that can be used during rule creation to indicate back-end servers Options: -s specifies a list of servers to add to the servergroup. hostspec: hostname|ip[-ip] IPv6 addresses must be enclosed in brackets "[]" to distinguish them from ":port" portspec: service|port[-port] -i adds incoming options name: interface name (eg "e1000g0") src: ip[-ip] (NAT only): src ip address to replace incoming packets' src address, or a range of hosts (if second ip is given) for both ip and port ranges: the second value must be greater than the first, with IP addresses being interpreted as notated MSB-first. Ranges aren't supported when using hostnames. disable-server [-t] -s server=hostspec[:portspec ...] disables the given servers *for all servergroups*, ie, if a given server's details are found in more than one servergroup, every one of these servergroups will be affected. -t temporarily disable a list of servers -s server (list) hostspec: ip|hostname (see "create-servergroup" for IPv6 syntax rules) portspec: port#|service This is reduced from what can be given for servers with "create-servergroup" - we believe it only makes sense to disable one server at a time, or even only a port at a time, if one is given. This information is not persistent across reboots. To permanently remove a server from a servergroup, use "remove-server" enable-server [-t] -a|-s server=value[,value] (re)enables a disabled server with the given value. see "disable-server" for what information goes into "value" if no port is specified, all ports for the given server are enabled. see "disable-server" above for details on options. destroy-servergroup groupname destroys a server group. list-servergroup [-f|[-p] -o field[,field]] [name] lists a servergroup (or all, if no name is given) Options: -f full (the default is names only) -o print the specified fields -p print fields in parsable format (see above), requires -o the options -f and -o (with or without -p) are mutually exclusive. add-server -s server=value[,value ...] servergroup add server(s) specified to servergroup see "create-servergroup" for definition of value -s create-servergroup remove-server -s server=value[,value ...] servergroup remove server(s) from servergroup -s see create-servergroup export-servergroups [filename] import-servergroups [filename] these subcommands behave in a fashion analogous to export-rules and import-rules, resp. create-healthcheck hc-test=value[,hc-timeout=value][,hc-count=value] \ [,hc-interval=value][hc-port=value] hcname sets up healthcheck information for rules to use. the hc-test is performed hc-count times until it succeeds or hc-timeout has expired. For this implementation, all servers for a rule are checked using the same test. hc-test "PING", "TCP", external method (script, binary ...) hc-timeout until a test is to be considered failed if hc-test never succeeds. Optional; default TBD hc-count number of attempts to run hc-test Optional; default TBD hc-interval time between two tests (must be greater than hc-timeout * hc-count) hc-port Optional. Port to use for the test. When not used, ilbd will determine which port to use. The following arguments are passed to external methods: $1 VIP (literal IPv4 or IPv6 address) $2 Server IP (literal IPv4 or IPv6 address) $3 Protocol (UDP, TCP as a string) $4 Numeric port $5 Maximum time (in seconds) the method should wait before returning failure. If the method runs for longer, it may be killed, and the test considered failed. External methods should return 0 for success and 255 for failure. All other return values are reserved for future use. destroy-healthcheck hcname monitor filename causes monitoring information to be appended to file . use '-' for stdout Examples: example: round-robin all dns traffic ilbadm create-servergroup -s servers=dnsserver1,dnsserver2 dnsgroup ilbadm create-rule -e -i proto=UDP,ipversion=ipv4,vip=1.2.3.4,port=DNS \ -m lbalg=round-robin,type=DSR \ -o servergroup=dnsgroup dnsrule example: add a server to the servergroup defined above: ilbadm add-server -s server=dnsserver22 dnsgroup example: distribute http traffic between 4 servers ilbadm create-servergroup -s servers=webserv1,webserv2,webserv3 webgroup ilbadm add-server -s servers=webserv4 webgroup ilbadm create-rule -i port=80,vip=15.192.0.0,ipversion=IPv4 \ -m lbalg=hash-IP-port,type=NAT \ -o servergroup=webgroup webrule example: prepare two sets of rules: (notice there's an overlap here - perhaps because 10.1.1.3 is a bigger box than the other ones.) ilbadm create-servergroup -s servers=10.1.1.0,10.1.1.2,10.1.1.3 \ websg ilbadm create-servergroup -s servers=10.1.1.3,ftpserv.our.org \ ftpgroup ilbadm create-rule -e -i port=http -m lbalg=hash-IP-port,type=NAT \ -o servergroup=websg webrule ilbadm create-rule -i port=ftp -m lbalg=hash-IP-port,type=NAT \ -o servergroup=ftpgroup ftprule ilbadm create-rule -e -i port=ftp-data -m lbalg=hash-IP-port,type=NAT \ -o servergroup=ftpgroup ftpdatarule Example: print a list of rules ('$' prompt added for readability): $ ilbadm list-rules rule4 rule3 RULE-all $ ilbadm list-rules -f RULE ACT IPv. PROTO VIP PORT ALGORITHM TYPE S.GROUP rule4 Y IPv4 tcp 1.2.3.4 ftp roundrobin DSR ftpgroup rule3 N IPv6 tcp 2003::1 ftp roundrobin DSR ftpgroup6 RULE-all Y IPv6 tcp 2002::1 http roundrobin DSR webgrp_v6 in the following example, long lines are wrapped for easier reading: Example: export rules. import-rules $ ilbadm export-rules create-rule -e ipversion=IPv4,protocol=tcp,VIP=1.2.3.4,port=ftp \ -m algorithm=roundrobin,type=DSR \ -o servergroup=ftpgroup rule4 create-rule ipversion=IPv6,protocol=tcp,VIP=2003::1,port=ftp \ -m algorithm=roundrobin,type=DSR \ -o servergroup=ftpgroup6 rule3 create-rule -e ipversion=IPv6,protocol=tcp,VIP=2002::1,port=http \ -m algorithm=roundrobin,type=DSR \ -o serverrgroup=webgrp_v6 RULE-all NAME ilbadm show-statistics DESCRIPTION We define these set of kstats for the ilb project: module: "ilb" instance: 0 class: "kstat" name "global" statistic: "num_rules" class: "rulestat" name statistic: "create_time" "num_servers" "bytes_dropped" "pkt_dropped "ip_frag_in" "ip_frag_dropped" class "serverstat" name statistic: "bytes_processed" "pkt_processed" NAME ilbadm show-statistics SYNOPSIS ilbadm show-statistics [-thaAd] [-r rule] [interval [count]] -t print a timestamp with every header -d print delta over whole interval (default: changes per second) -a print absolute numbers as well delta -A print only absolute numbers (since module initialisation) if both -a and -A are given, last takes precedence DESCRIPTION while for the most part the behaviour of lbstat is intuitive and usage can be directly adapted from vmstat etc., a few points: - headers are printed once for every 10 samples. This is hard-coded. - timestamps, if chosen, are printed before the header. The format is fixed to the system's "date" format for the C locale. - currently, addition or removal of a rule is neither detected nor indicated. EXAMPLES $ ilbadm show-statistics 1 pkts not bytes not processed proc'd dropped processed proc'd dropped 232 16 0 10286 738 0 0 0 0 0 0 0 0 0 0 0 0 0 $ ilbadm show-nat inside:global local outside:local global 171.16.68.5:80 10.10.10.1:80 171.16.68.1:80 171.16.68.1:80 171.16.68.5 10.10.10.1 --- --- the following passage is stolen from http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080094837.shtml: ==== begin quote Cisco defines these terms as: * Inside local address: The IP address assigned to a host on the inside network. This is the address configured as a parameter of the computer OS or received via dynamic address allocation protocols such as DHCP. The address is likely not a legitimate IP address assigned by the Network Information Center (NIC) or service provider. * Inside global address: A legitimate IP address assigned by the NIC or service provider that represents one or more inside local IP addresses to the outside world. * Outside local address: The IP address of an outside host as it appears to the inside network. Not necessarily a legitimate address, it is allocated from an address space routable on the inside. * Outside global address: The IP address assigned to a host on the outside network by the host owner. The address is allocated from a globally routable address or network space. These definitions still leave a lot to be interpreted. For this example, this document redefines these terms by first defining local address and global address. Keep in mind that the terms inside and outside are NAT definitions. Interfaces on a NAT router are defined as inside or outside with the NAT configuration commands, ip nat inside and ip nat outside. Networks to which these interfaces connect can then be thought of as inside networks or outside networks, respectively. * Local address: A local address is any address that appears on the inside portion of the network. * Global address: A global address is any address that appears on the outside portion of the network. ==== end quote Appendix C: Redundancy scenarios that ILB will be able to handle ---------------------------------------------------------------------- DSR Topology =============== Clients in the Internet | | ---------- | ROUTER | ----------- | 192.168.6.1 | ===================================================== 192.168.6.0/24 | | | | | |VIPS for virtual services| | eth0 192.168.6.3| |eth0 192.168.6.2 | | --------- --------- | | | LB1 | | LB2 | | | |Primary| |Standby | | | | | | | | | -------- ---------- | | | | | | | | | | | | | | | | | | | | | | --------- --------------- | | |SWITCH 1|------|SWITCH 2 | | | | | | | | | | | | | | | --------- --------------- | | | \ / | | | ================================ 10.0.0.0/24 | | | | | | Server1 Server2 -------------- | | | ----------------------------------------- Server IP address: 10.0.0.x/24 Default router on servers = 192.168.6.1 All VIPs on LBs are configured on interfaces facing subnet 192.168.6.0/24 . LB1 runs a VRRP instance per VIP. NAT Topology ============== Clients in the Internet | | ---------- | ROUTER | ----------- | 192.168.6.1 | ================================== 192.168.6.0/24 | | | | VIPs for virtual services eth0 192.168.6.3| |eth0 192.168.6.2 ------- --------- | LB1 | | LB2 | |Master| |Backup | | | | | ------- -------- | | | | | | | | Floating default gateway 10.0.0.1 | | --------- --------------- |SWITCH 1|------|SWITCH 2 | | | | | | | | | --------- --------------- | \ / | ================================ 10.0.0.0/24 | | Server1 Server2 Server IP address: 10.0.0.x/24 Default router on servers = 10.0.0.1 All VIPs on LBs are configured on interfaces facing subnet 192.168.6.0/24 . LB1 runs a VRRP instance per VIP and the floating default gateway. Failure scenario 1 : LB1 is dead Solution: LB2 will detect the failure and take over as the primary for all the VIPs Failure scenario 2: LB1:eth0 LB1:eth1 are down Solution: LB1 continues to think its the Primary and sends VRRP advertisements, which never reaches LB2. LB2 becomes Primary. so now LB1 and LB2 are *both* primary load balancers and this is fine as nothing from LB1 will reach the servers. When the links of LB1 are back up, LB2 will receive the advertisements and will relinquish its position as Primary and become a Standby again. Appendix D Load balancer topologies --------------------------------------- Single legged topology -------------- |Load Balancer| -------------- | | ------- | -------- Internet----- |Router| ------ Local Network ---------- |Server1| ------- -------- --------- ------------ |Server2| -------- Dual legged topology -------- -------------- Internet -------| Router|----- Local Network----|Load Balancer| --------- --------------- | Target Network | | -------- --------- |Server1| |Server2| --------- ---------