1. Introduction ------------------ There is a need for an interface that provides visibility into network traffic in the light of Crossbow artifacts and features like virtual NICs, interrupt vs. polling modes, hardware lanes, software lanes, fanout etc. This document details out design choices we make while introducing new kstat counters to provide this desired visibility. These kstat counters would in turn be consumed by dlstat(1M) and flowstat(1M). A reader interested only in accessing fine granularity network stats can skip section 3 on Design Considerations without loss of continuity. 2. Terminology ------------------- Ring: Hardware ring (queue) on a physical NIC. Receiver Software lane (rx_slane): corresponds to software classified SRS. When a mac client has dedicated hardware rx ring(s), software lane sees only local traffic. Otherwise, traffic seen by it also includes traffic received over wire that is delivered to the mac client after software classification. Receiver Hardware lane (rx_hlane): corresponds to hardware classified SRS, includes hardware ring, kernel thread, DMA channel etc. A rx_hlane is always associated with a unique rx ring. Transmitter Software lane (tx_slane): corresponds to tx_srs. When a mac client has dedicated tx ring(s), software lane sees only local traffic. Otherwise, traffic seen by it also includes traffic transmitted over wire. Transmitter Hardware lane (tx_hlane): corresponds to tx soft ring, includes hardware ring, kernel thread, DMA channel etc. A tx_hlane is always associated with a unique tx ring. Fanout: software fanout on top of SRS, has significance only on the Rx side, corresponds to rx soft ring. Data-links vs. mac clients: o Physical NIC nxge1 is a data-link while plumbed nxge1 is a primary mac client. o ldom/xen clients are (non-primary) mac clients of the underlying data-link; they are not datalinks since they do not perform mac registration. o Vnic is (non-primary) mac client on top of physical NIC (may use one or more of underlying NIC's rings), as well as data-link (does mac_register). o Aggr is (non-primary) mac client on top of two or more physical NICs, as well as data-link (does mac_reigster). Moreover, each link that is part of the aggregation, has a mac client and data-link of its own. 3. Design considerations ------------------------------ kstat's stat interface will distinguish between data-links and mac clients. In case of physical NIC, number of exposed rings are known at the registration time and the number stays the same throughout the lifetime of the physical NIC (till mac_unregister). These rings in turn could be exchanged/shared between different mac clients at various points in time. In case of a vnic/aggr, number of rx and/or tx rings as known at the registration time might change over the lifetime of that vnic/aggr. For example, a new NIC could be added to an aggr or one may deliberately choose to assign more rings to a particular vnic after it is plumbed. Moreover, number of rings assigned to a vnic may also change with the addition or deletion (or unplumb) of other mac clients configured on the same physical NIC. We want to maintain per hardware lane statistics for mac clients. Since number of hardware lanes of a mac client can change over its lifetime, we do not know how many data points we need to maintain at the time of mac client open. The number can even grow arbitrarily large (e.g. adding new NIC to already plumbed aggr). Thus, we can't do kstat_create just once for a mac client but need to do it as and when needed. Introduction of a ring or instantiation of every lane or fanout is accompanied by creation of corresponding kstat. A summary view kstat counter should be maintained for number of rx/tx software/ hardware lanes, rx/tx rings etc. 3.a Receive side statistics A mac client has one or more Rx SRSes. A Rx SRS either has a dedicated hardware ring associated with it or is software classified. We instantiate a new kstat stat when a Rx SRS is created to account for the traffic passing through that SRS. This stat is destroyed along with the SRS. Traffic flowing into a Rx SRS could be fanned out across multiple soft rings. We instantiate a new kstat when a soft ring is created to account for the traffic passing through that soft ring. This stat is destroyed along with the soft ring. Fanout (soft rings) are created in group of 3 viz. tcp, udp, oth. However, that split is an implementation detail and we do not expose it through kstat. If a mac client does not have dedicated hardware rx ring associated with it, packets seen by its software classified SRS includes both local traffic + traffic received from wire. A separate kstat counter needs to be maintained for local traffic. When a rx ring (or group of rings) is shared by two or more mac clients, provide an interface to query an individual mac client's contribution to the traffic as seen by a particular ring. 3.b Transmit side statistics We maintain a soft ring for each of the dedicated hardware tx rings associated with a mac client. Create a kstat during creation of each of these soft rings. If a mac client does not have a dedicated hardware ring, maintain kstats in its SRS. If a mac client does not have dedicated hardware tx ring associated with it, packets sent by its SRS include both local traffic + traffic sent on wire. A separate kstat counter needs to maintained for local traffic. When a tx ring (or group of rings) is shared by two or more mac clients, provide an interface to query an individual mac client's contribution to traffic sent through a particular ring. 3.c kstat nomenclature A new kstat statistics takes four arguments viz. module, instance, name, class. While kstat for data-link has to deal with hardware rings, kstat for mac clients has to deal with software/hardware lanes as well as software fanout. Moreover, not all mac clients have instance number [Ref. Appendix 1]. Thus, we choose to represent these statistics as: o mac module statistics: module = mac, o single instance of mac module: instance = 0. o network statistics: class = net We capture the distinction between hardware vs. software lanes as well as data-link vs. mac clients in the remaining field viz. 'name'. We heavily rely on support for wildcarding that is already built in kstat command. [Ref Examples section below]. 3.d kstats for flows Internally, flows are also implemented with SRSes. While we support software fanout for flows, we do not support assigning dedicated hardware resources to flows as yet (Crossbow Phase I, integrated in snv_105). As explained above, kstats are created at the creation of every SRS and soft ring and destroyed with it. Thus, fine granularity kstats for flows do not require additional efforts. 4. Examples --------------- Let us consider a system with nxge card nxge1 (10GbE/8 rings) with rings distributed as: o plumbed nxge1 using 4 rings o ldom client vsw1 configured on nxge1: taking up 2 rings o vnic1 configured on nxge1: taking up 2 rings 4.1 The name nxge1 a. kstats for data-link nxge1: #kstat -n nxge1_rx_ring3 # Hardware receive ring #3 module: mac instance: 0 name: nxge1_rx_ring3 class: net ipkts 0 ibytes 0 #kstat -n nxge1_tx_ring2 # Hardware transmit ring #2 module: mac instance: 0 name: nxge1_tx_ring2 class: net opkts 0 obytes 0 #kstat mac:0:nxge1_rx_ring* OR #kstat -n nxge1_rx_ring* # All hardware receive rings module: mac instance: 0 name: nxge1_rx_ring0 class: net ipkts 0 ibytes 0 name: nxge1_rx_ring1 class: net ipkts 0 ibytes 0 . . . . . . name: nxge1_rx_ring7 class: net ipkts 0 ibytes 0 kstat -n nxge1*ring* will display both tx and rx hardware rings for NIC nxge1. b. kstats for mac client nxge1: i. Rx: One software lane for software classified SRS. #kstat -n nxge1_rx_slane0 # Software rx lane module: mac instance: 0 name: nxge1_rx_slane0 class: net ipkts 0 ibytes 0 ii. Rx: One hardware lane for each hardware ring assigned to mac client #kstat -n nxge1_rx_hlane2 # Hardware rx lane #2 module: mac instance: 0 name: nxge1_rx_hlane2 class: net ipkts 0 ibytes 0 poll 0 intr 0 chain<10 0 chain10-50 0 chain>50 0 kstat -n nxge1_rx_hlane* would list statistics for all the hardware rx lanes as well as fanout (Ref: 4.1.b.iii below). kstat -n nxge1_*hlane* would list statistics for all the hardware lanes (both rx and tx) as well as fanout (Ref: 4.1.b.iii below). iii. Rx side: Software fanout #kstat -n nxge1_rx_hlane2_fanout1 # H/w rx lane 2's fanout #3 module: mac instance: 0 name: nxge1_rx_hlane2_fanout3 class: net ipkts 0 ibytes 0 kstat -n nxge1_rx_hlane2_fanout* would list statistics for all fanouts of hardware lane 2 while kstat -n nxge1_rx_slane0_fanout* will give fanout for the software lane. 4.2 The name vsw1 denotes mac client only a. Since there is no data-link with name vsw1, kstat -m mac -n vsw1*ring* will not yield anything. b. kstat for mac client vsw1 (2 hardware lanes one for each physical ring) Similar to 4.1.b, kstat names would be: o vsw1_rx_slane0, o vsw1_rx_hlane{0,1}, o vsw1_rx_hlane{0,1}_fanout{0,1,...}, o vsw1_rx_slane0_fanout{0,1,...} 4.3 The name vnic1 may denote a data-link or a mac client. a. kstat for data-link vnic1 (not supported in first putback) Similar to 4.1.a, kstat names would be: o vnic1_rx_ring{0,1}, o vnic1_tx_ring{0,1} b. kstat for mac client vnic1 Similar to 4.1.b, kstat names would be: o vnic1_rx_slane0, o vnic1_rx_hlane{0,1}, o vnic1_rx_hlane{0,1}_fanout{0,1,...}, o vnic1_rx_slane0_fanout{0,1,...} Note that vnic data-link statistics are exactly identical with that vnic's mac client statistics. When rings are added/removed from a vnic mac client, they are added/removed from corresponding data-link as well. We support vnic data-link kstat view in order to be consistent with primary NIC data-link view. 4.4 Let us consider an aggr example: Say aggr1 is configured on top of nxge1 (10GbE/8 rings) and nxge2 (10GbE/8 rings). We note that with the logic to implement above design is in place, no additional piece of logic is necessary to make the aggr case work. The plumbed aggr1 will have: o data-link nxge1 (Ref. 4.1.a) o mac client aggr1-nxge1 (Ref 4.4.c, similar to 4.1.b) o data-link nxge2 (Ref. 4.1.a) o mac client aggr1-nxge2 (Ref. 4.1.b) o data-link aggr1 (Ref. 4.4.a, similar to 4.1.a) o mac client aggr1 (Ref. 4.4.b, similar to 4.1.b) a. kstat for data-link aggr1 (not supported in first putback) Similar to 4.1.a, kstat names would be: o vnic1_rx_ring{0,...,15} o vnic1_tx_ring{0,...,15} b. kstat for mac client aggr1 Similar to 4.1.b, kstat names would be: o aggr1_rx_slane0, o aggr1_rx_hlane{0,...,15}, o aggr1_rx_hlane{0,...,15}_fanout{0,1,...}, o aggr1_rx_slane0_fanout{0,1,...} c. kstat for mac client aggr1-nxge1 Similar to 4.1.b, kstat names would be: o aggr1_nxge1_rx_slane0, o aggr1_nxge1_rx_hlane{0,...,7}, o aggr1_nxge1_rx_hlane{0,...,7}_fanout{0,1,...}, o aggr1_nxge1_rx_slane0_fanout{0,1,...} 4.5 Let us consider a flow (say flow1) configured on top of some physical NIC. a. ring kstats (not supported, flows can't have dedicated resources today) Similar to 4.1.a, kstat names would be: o flow1_rx_ring{0,1}, o flow1_tx_ring{0,1} b. lane kstats for flows Similar to 4.1.b, kstat names would be: o flow1_rx_slane0, o vnic1_rx_slane0_fanout{0,1,...} o flow1_rx_hlane{0,1},(Not supported today) o flow1_rx_hlane{0,1}_fanout{0,1,...}, (Not supported today) Note that since our design maintains kstats within SRSes and soft rings, once we support dedicated hardware resources for flows, no additional logic would be necessary for supporting kstat counters like flow1_rx_hlane{0, 1}. 5. Scope of first putback ------------------------------ Scope of first putback includes introduction of statistics for all the examples detailed in Examples section above (unless specifically marked otherwise). Scope of first putback also involves writing dlstat(1M), flowstat(1M) for consuming the aforemtiontioned statistics. 6. Follow-up work ------------------ Following work is identified as follow-up work: o A kstat interface to query an individual mac client's contribution to traffic received by / sent through a particular ring when a rx/tx ring (or group of rings) is shared by two or more mac clients. o Split rx/tx software lane counter into: local traffic vs. software classified traffic. o Remove link statistics from dls. Currently, dladm show-link -s is the only consumer for this. That interface will be EOL'ed in near future. o Extend mac_srs macro, add new mdb macros for querying ring and fanout information. Provide a way to tie kstat names with physical addresses to query particular data structure. o First putback supports lane stats for vnic and aggr i.e. kstat -n vnic1_*lane* for all lane stats of vnic1. It can be extended to support vnic1_*ring* to list only the stats for rings dedicated to vnic1. Note that this involves supporting a new interface only. Vnic1 ring stats will not differ from vnic1 lane stats. Appendix 1 -------------- In case of nxge1, we can perhaps use name as nxge and instance as 1 even for the mac client kstats. However, not all mac clients do mac_register and get instance number. For example, ldom/vnet client does a mac_open but never does mac_register. Moreover, consider an aggr, say aggr1, aggregating bge0 and nxge0. Two mac clients viz. aggr1-bge0 and aggr1-nxge0 are created. It is impossible to define instance number that can distinguish between these two mac clients.