ANNEX - BPF Interaction with MAC ================================ Whilst BPF (Berkely Packet Filter) is considered a mature technology in the open source community, embedding it in Solaris is new. To help users understand how BPF fits into the Solaris kernel, this document will aim to describe the way in which it works. Overview ~~~~~~~~ BPF provides the means to capture packets, per network interface. Where a NIC has multiple modes in which it can present the network data, BPF allows for the selection of the corresponding data link type. BPF presents itself to applications as a character device driver that is opened and from which packets are read. It is necessary to tell BPF which network interface you wish to receive packets from. Only one network interface at a time may be selected as the source for packets. After either the buffer becomes full, a given timeout expires or a packet arrives and immediate delivery is requested, the driver will indicate to any sleeping applications that they can now read a packet from the buffer. The number of bytes, from each packet, that are copied into the buffer is controlled by the application. Thus an application may choose to receive entire packets (by specifying a very large packet capture size, say 128k) or just enough to get all of the network and transport headers (128 bytes). If the buffers used by BPF inside the kernel become full, packets will be "lost" - no details recorded - even though they are received by the system. BPF also provides applications with the ability to send packets, using the write(2) system call. Packets to be sent out must be fully formed: BPF does no address resolution, checksum calculation, etc, before passing the packet down to the network interface. BPF, MAC and packet capture ~~~~~~~~~~~~~~~~~~~~~~~~~~~ BPF predominately makes use of the mac layer in Solaris to implement the actual packet "sniffing." When first loaded, BPF calls a function in the mac module to set the attach/detach functions to call. This function is called dls_set_bpfattach(). It is also used when the BPF kernel module is unloading to stop the mac module calling into BPF. When dls_set_bpfattach() is first called to set the attach function, it walks through the network interfaces already registered with dls_devnet_create() and calls the BPF attach function for each one. The interface name and datalink type (DLT) provided by the attach call are recorded by BPF in an internal list. It should be remembered that Solaris allows for network interfaces to be configured and active, as per the output of "dladm show-link" even if they're not used or visible with IP ("ifconfig -a" does not list an interface.) BPF is then ready to start capturing packets. To start capturing packets, an application must first open the /dev/bpf device and then issue an ioctl to set the target interface for packet capture. Setting the interface requires that the network interface is known to the datalink layer and that an identifier has been assigned to it in dls using dls_mgmt_get_linkid(). The returned linkid must be amongst those that known to be used by interfaces through the attach calllback (see above.) With this, BPF is then able to open the interface with a call to mac_open() and call mac_promisc_add() to start receiving packets. For each mac interface that is opened using mac_open(), a client handle is also acquired, using mac_client_open(), to support the sending of packets using write(2) on the device via mac_tx(). When the network interface is first selected for packet capture, only local packets are received: by default, the network interface is not placed into promiscuous mode. If the application wishes to capture all packets on the given link using promiscuous mode, another BPF ioctl must be issued that enables promiscuous mode. It is not possible, with BPF, to turn promiscuous mode off, once it has been enabled. The file descriptor must be closed and a new one created. Receiving a packet ~~~~~~~~~~~~~~~~~~ Packets make their way into BPF via the callback registered with mac_promisc_add(). This function manages the physical state of the network interface (does it need to be in promiscuous mode, etc) and the correct delivery of packets to the various callbacks registered with it. Packets are delivered to BPF via mac_promisc_dispatch(), that is called from the mac layer (e.g. mac_rx()) when callbacks have been added to the network interface. When packets first arrive in BPF, they're examined to see if the entire packet is in a single buffer (dblk_t) or if it is split over mutiple buffers. This analysis tells the packet capture code, that is executed next, whether or not it it can optimise its behaviour for single-buffer work or not. The packet is passed into bpf_filter(), which executes the filter program against the packet. The return value from this function is the number of bytes to capture. 0 is considered to mean that the packet did not match the filter. If the packet does match the filter then BPF will attempt to copy the requested number of bytes into an internal buffer. BPF keeps two buffers: an active buffer and a hold buffer. The active buffer is where it is currently saving packets to and the hold buffer is the one from which an application reads packets from. Both buffers start out empty. If the active buffer fills before the application asks for packet data, BPF switches the roles of the buffers around, making the full buffer the hold and the previously empty buffer the new active buffer. If an application tries to read data and the hold buffer is empty but the active buffer is not empty, the application will sleep unless it has enabled non-blocking IO or immediate delivery of packets using BIOCIMMEDIATE. Aside: BSD vs Solaris mac ------------------------- In BSD, BPF receives packets per network interface and then walks through all of the filters attached to that network interface to see which descriptors are interested in the packet. The behaviour for Solaris changes, slightly. Instead of having a single function call registered for BPF with mac_promisc_add() that then walks through all of the descriptors on a network interface, mac_promisc_add() is called for each network interface directly. To do otherwise would require BPF to know how many times mac_promisc_add() had been called for each promiscuous mode supported by mac. This would be required by BPF so that it could correctly put the physical interface in promiscuous mode. In effect, work that mac does to manage the physical interface being in promiscuous mode would need to be duplicated in BPF. Nothing comes for free and the catch here is that when there are multiple descriptors active on a network interface, the loop to process each descriptor results in more function calls being made, possibly costing a small amount of performance.