ANNEX - BPF Interaction with MAC
================================
Whilst BPF (Berkely Packet Filter) is considered a mature technology in
the open source community, embedding it in Solaris is new. To help users
understand how BPF fits into the Solaris kernel, this document will aim
to describe the way in which it works.

Overview
~~~~~~~~
BPF provides the means to capture packets, per network interface. Where
a NIC has multiple modes in which it can present the network data, BPF
allows for the selection of the corresponding data link type.

BPF presents itself to applications as a character device driver that
is opened and from which packets are read. It is necessary to tell BPF
which network interface you wish to receive packets from. Only one
network interface at a time may be selected as the source for packets.

After either the buffer becomes full, a given timeout expires or a
packet arrives and immediate delivery is requested, the driver will
indicate to any sleeping applications that they can now read a packet
from the buffer. The number of bytes, from each packet, that are copied
into the buffer is controlled by the application. Thus an application
may choose to receive entire packets (by specifying a very large packet
capture size, say 128k) or just enough to get all of the network and
transport headers (128 bytes). If the buffers used by BPF inside the
kernel become full, packets will be "lost" - no details recorded - even
though they are received by the system.

BPF also provides applications with the ability to send packets, using
the write(2) system call. Packets to be sent out must be fully formed:
BPF does no address resolution, checksum calculation, etc, before
passing the packet down to the network interface.

BPF, MAC and packet capture
~~~~~~~~~~~~~~~~~~~~~~~~~~~
BPF predominately makes use of the mac layer in Solaris to implement
the actual packet "sniffing." When first loaded, BPF calls a function
in the mac module to set the attach/detach functions to call. This
function is called dls_set_bpfattach(). It is also used when the BPF
kernel module is unloading to stop the mac module calling into BPF.

When dls_set_bpfattach() is first called to set the attach function,
it walks through the network interfaces already registered with
dls_devnet_create() and calls the BPF attach function for each one.
The interface name and datalink type (DLT) provided by the attach
call are recorded by BPF in an internal list. It should be remembered
that Solaris allows for network interfaces to be configured and
active, as per the output of "dladm show-link" even if they're not
used or visible with IP ("ifconfig -a" does not list an interface.)

BPF is then ready to start capturing packets.

To start capturing packets, an application must first open the /dev/bpf
device and then issue an ioctl to set the target interface for packet
capture. Setting the interface requires that the network interface is
known to the datalink layer and that an identifier has been assigned
to it in dls using dls_mgmt_get_linkid(). The returned linkid must be
amongst those that known to be used by interfaces through the attach
calllback (see above.) With this, BPF is then able to open the interface
with a call to mac_open() and call mac_promisc_add() to start receiving
packets. For each mac interface that is opened using mac_open(), a
client handle is also acquired, using mac_client_open(), to support
the sending of packets using write(2) on the device via mac_tx(). When
the network interface is first selected for packet capture, only local
packets are received: by default, the network interface is not placed
into promiscuous mode.

If the application wishes to capture all packets on the given link using
promiscuous mode, another BPF ioctl must be issued that enables promiscuous
mode. It is not possible, with BPF, to turn promiscuous mode off, once
it has been enabled. The file descriptor must be closed and a new one
created.

Receiving a packet
~~~~~~~~~~~~~~~~~~
Packets make their way into BPF via the callback registered with
mac_promisc_add(). This function manages the physical state of the
network interface (does it need to be in promiscuous mode, etc) and
the correct delivery of packets to the various callbacks registered
with it. Packets are delivered to BPF via mac_promisc_dispatch(),
that is called from the mac layer (e.g. mac_rx()) when callbacks
have been added to the network interface.

When packets first arrive in BPF, they're examined to see if the
entire packet is in a single buffer (dblk_t) or if it is split over
mutiple buffers. This analysis tells the packet capture code, that
is executed next, whether or not it it can optimise its behaviour
for single-buffer work or not.

The packet is passed into bpf_filter(), which executes the filter
program against the packet. The return value from this function is
the number of bytes to capture. 0 is considered to mean that the
packet did not match the filter. If the packet does match the
filter then BPF will attempt to copy the requested number of bytes
into an internal buffer. BPF keeps two buffers: an active buffer
and a hold buffer. The active buffer is where it is currently saving
packets to and the hold buffer is the one from which an application
reads packets from. Both buffers start out empty. If the active buffer
fills before the application asks for packet data, BPF switches the
roles of the buffers around, making the full buffer the hold and the
previously empty buffer the new active buffer. If an application tries
to read data and the hold buffer is empty but the active buffer is
not empty, the application will sleep unless it has enabled non-blocking
IO or immediate delivery of packets using BIOCIMMEDIATE.

Aside: BSD vs Solaris mac
-------------------------
In BSD, BPF receives packets per network interface and then walks
through all of the filters attached to that network interface to see
which descriptors are interested in the packet. The behaviour for
Solaris changes, slightly. Instead of having a single function call
registered for BPF with mac_promisc_add() that then walks through
all of the descriptors on a network interface, mac_promisc_add()
is called for each network interface directly. To do otherwise would
require BPF to know how many times mac_promisc_add() had been called
for each promiscuous mode supported by mac. This would be required
by BPF so that it could correctly put the physical interface in
promiscuous mode. In effect, work that mac does to manage the physical
interface being in promiscuous mode would need to be duplicated in BPF.
Nothing comes for free and the catch here is that when there are multiple
descriptors active on a network interface, the loop to process each
descriptor results in more function calls being made, possibly costing
a small amount of performance.