Enforcing Bandwidth Limit for DomU 1. Overview After Crossbow's integration into Nevada, it's possible to enforce bandwidth limit for a domU from dom0 by setting bandwidth limit property for a back end NIC device attached to the domU. Currently, three network resource limit properties are supported by Crossbow: bandwidth limit, CPU list(fanout), and priority. After matching them with supported properties of virtual NIC device in xend, 'bandwidth limit' is the only obvious property supported by both Crossbow and xend. So, the current plan is to leverage Crossbow's bandwidth limit capability to enforce bandwidth limit for each virtual NIC device in domU from within Solaris dom0. In this design, we plan to support a few ways to specify bandwidth limit for a specific virtual NIC device attached to a domU: + Through domU configuration file: - XML format used by virsh - simple '.py' format used by xm + Through command line - 'virsh attach-interface' - 'xm network-attach' - 'virt-install' The code flow for bandwidth information to be passed and processed can be described in high level in below graph: XML file --> virsh --+ | v virt-install ----> libvit --> virtd ^ | | (XML->SXP) virsh ----------+ | attach-interface v xend --> xenstored --> vif-vnic/ --> dladm '.py' file ---+ ^ vif-dedicated | | v | xm -------+ ^ | xm ----------+ network-attach In upper half of the graph, bandwidth limit information will be represented in XML format and will be converted to SXP format by virtd before it is handed to xend. Xend will then translate it into xenstore data format, which will be read by vif-vnic/vif-dedicated scripts who will issue appropriate dladm command with correct bandwidth information. In lower part of the graph, xm command will collect bandwidth information specified in .py format and hand it to xend as is. I will discuss more about each step in below sections. 2. Bandwidth limit expressed in xend string and xenstore string The bandwidth limit property of a virtual NIC device is supported in xend with name 'rate' containing a regular expression defined in xen/xend/server/netif.py: "^([0-9]+)([GMK]?)([Bb])/s(@([0-9]+)([mu]?)s)?$". So, a string like "100Mb/s@10us" means 100 megabits per second at 10us sample period. We will refer to this string as xend string in later discussion. Xend will convert xend string to 'amount,period' format and write it to the corresponding virtual NIC device entry in Xenstore (rate = 'amount,period'). The conversion is done by parseRate() in xen/xend/server/netif.py. We will refer to this string showed up in xenstore as xenstore string in later section. In xenstore string, amount is in byte with 0xffffffff as maximum, and period is in microsecond also with 0xffffffff as maximum. When period is 0, we grant unlimited bandwidth to the corresponding nic device. Note that both xend string and xenstore string are defined by Xen upstream code from Xen community. There is no plan to change format of any of them since they are not Solaris specific definition. In Solaris, they will be converted to the format specific to Solaris (dladm, to be specific), which will be described in next section. 3. Conversion of xenstore string for dladm 3.1 Format conversion With post-Crossbow's dladm command, bandwidth limit is specified as an integer with one of the scale suffixes(K, M, or G for Kbps, Mbps, or Gbps). And the minimum bandwidth supported is 1.2Mbps. Thus, with 'rate = "amount,period"' in xenstore, bandwidth limit should be set as: ( amount * 8 / period ) with 'M' suffix in dladm command line. If we get result <= 1M from above calculation, we will set bandwidth limit to '1200K', which is the minimum bandwidth currently supported by Crossbow. Note: if 'period' is 0, we will not set bandwidth limit to the NIC device. 3.2 Doing conversion in vif-vnic and vif-dedicated There are currently two shell scripts responsible for setting up back end NIC device for domU from within dom0 by issuing appropriate dladm command. They are: /usr/lib/xen/scripts/vif-vnic and usr/lib/xen/scripts/vif-dedicated. We will discuss the changes in each of them here. 3.2.1 Process xenstore string in vif-vnic Vif-vnic is used when you want to share the back end NIC device between multiple domUs. A VNIC will be set up over the NIC and attached to the domU. So that the domU is actually using the VNIC instead of the NIC device directly. Currently, VNIC is created in vif-vnic by issuing /usr/lib/vna command. Since vna does not support setting bandwidth property for VNIC, we need to switch from vna to dladm completely in vif-vnic. We need to read xenstore string and do the conversion in vif-vnic. Then, VNIC will be created by issuing appropriate dladm command and 'maxbw' property of the VNIC should be set at the same time with the converted string to specify the bandwidth limit for the VNIC. 3.2.2 Process xenstore string in vif-dedicated Vif-dedicated is used when you want to dedicate a NIC device to a domU. So, this NIC device will be attached to the domU directly and should not be attached to other domUs. Note that, besides a physical NIC device, the NIC can also be a pre-defined VNIC device in dom0. Like what we will do in vif-vnic, xenstore string will be read and converted in vif-dedicated. Then it will be set to 'maxbw' property of the NIC device by issuing appropriate dladm command to specify the bandwidth limit for the NIC. 4. Communicate bandwidth limit to xend In order to make xend understand user-provided bandwidth limit and do the correct conversion into xenstore string and write to xenstore, we need to make sure that user-provided bandwidth limit is converted to xend string properly, which means that most of the work we do below is to convert user string to xend string. We plan to support a few ways to specify bandwidth by end user (provide user string), which are discussed as below. 4.1 In '.py' file Actually, the current '.py' configuration file format has already supported specifying bandwidth limit for a specific virtual NIC device. The bandwidth can be set via 'rate' property when defining a virtual NIC device as below in '.py' file: vif = [ 'bridge=bge0,rate=100Mb/s' ] So, in this case, xend string format is used to specify bandwidth limit. The xend string ('100Mb/s') along with other information in '.py' file will be collected by xm command and copied to xend as is. To be specific, make_config() in xen/xm/create.py is responsible for collecting all configuration information in '.py' file. Particularly, configure_vifs() is called to parse virtual NIC device configuration, where bandwidth limit is set. Then, the collected configuration information is passed to make_domain(), which is responsible for copying it to xend by calling create() in xen/xend/XendDomainInfo.py via xend server proxy. No work is required in this case. 4.2 In 'xm network-attach' command line This is also supported by current implementation of xm command. When hotplug a virtual NIC device using 'xm network-attach', user can specify rate property as: # xm network-attach domu rate=100Mb/s Again, we use xend string format directly in this case. And the xend string ('100Mb/s') will also be copied to xend as is. To be specific, xm_network_attach() in xen/xm/main.py is responsible for collecting command line arguments and copy them to xend by calling device_create() in xen/xend/XendDomainInfo.py via xend server proxy. No work is required in this case, also. 4.3 In XML file 4.3.1 SXP format Although, XML format is used as the standard way to define the configuration of a domU in libvirt, xend does not understand it. Xend is using SXP format internally to express/define the configuration. 'Xm list -l ' can be used to dump configuration of a domU in SXP format. And for managed domUs, all of the configuration files are saved in SXP format under /var/lib/xend/domains, which is the default domain configuration store path defined in xen/xend/XendOptions.py. Since xend has already supported specifying bandwidth limit by setting 'rate' property of a virtual NIC device, the format of bandwidth limit expressed in SXP file is already defined as below: (device (vif (bridge nge0) (rate 100Mb/s) <== bandwidth limit (mac 00:16:3e:6a:8a:e8) ) ) From the above example, we can see that bandwidth limit in SXP format is expressed in xend string format. 4.3.2 XML format Unfortunately, the current implementation of virt-install/virsh/libvirt does not support specifying bandwidth limit for a virtual NIC device in XML format. We have to define the format and change the code to make them understand it. In order to insert bandwidth information into XML file, we create a new element, "networkresource", inside "interface" element. Inside "networkresource", we provide bandwidth by setting "cappedbandwidth" element with three attributes, "unit", "period" and "value" to express the bandwidth limit. For example (100Mb/s): The supported unit can be 'gigabit', 'megabit' and 'kilobit'. The supported period can be 'second', 'millisecond' and 'microsecond'. And value is an integer to express the amount of data in unit allowed to be transferred in the specified period of time. We can easily add more network resource limit in it by adding more element inside "networkresource" element in this format later, if needed. 4.3.3 Convert between XML and SXP Since xend only understands SXP format, virtd is doing conversion between XML format and SXP format as appropriate already. So, in order to support specifying bandwidth in XML format, we just need to insert the parsing and conversion code to existing virtd implementation to do the conversion appropriately. 4.3.3.1 From XML to SXP Both 'virsh create' and 'virsh define' will take XML formatted file as input. The XML file content will be passed to virtd, who does all the real work, including convert XML format to SXP format and make appropriate RPC call into xend passing configuration in SXP format to xend. No matter in which case, the conversion is done in virtd (src/xml.c) in virDomainParseXMLDesc(), which will call virDomainParseXMLIfDesc() to convert virtual NIC device part from XML to SXP. So, we need to modify virDomainParseXMLIfDesc() for virtd to understand our new element, "networkresource", and convert it to SXP format for xend to parse. 4.3.3.2 From SXP to XML 'Virsh dumpxml' is used to dump a domU configuration in XML format. Again, the real work is done by virtd, who will get domU configuration from xend in SXP format via appropriate RPC call, then convert it into XML format before return to virsh. Since SXP is a xend specific format, the conversion is actually handled by xend specific implementation in libvirt by xend_parse_sexp_desc() in xend_internal.c. So, we need to modify xend_parse_sexp_desc() to make it understand xend string in SXP format and convert it to XML format for 'virsh dumpxml' to work properly. 4.4 In 'virsh attach-interface' command line Since we can add a virtual NIC device via 'virsh attach-interface', we also want to support specifying bandwidth limit in this way. 4.4.1 Command line syntax for 'virsh attach-interface' The first thing to do is to allow specifying bandwidth in 'virsh attach- interface' command line. We need to extend the existing command line syntax as below: virsh attach-interface [--target ] [--mac ] [--script