PF Packet - Background ====================== To provide a Linux compatible alternative to DLPI for packet capture, PF_PACKET is being implemented for OpenSolaris. The goal is the try and remain as compatible with their programming APIs as possible. Both the performance of DLPI and its relative obscure method of use and programming have led to Solaris being shunned when it comes to a platform for building devices that rely on high speed packet capture. Limits ------ Introducing new behaviour for OpenSolaris is not acceptable, however it is not possible to provide all of the Linux interfaces. Where support of a particular Linux interface is not possible, an error will be returned. Introduction ============ On Linux, the PF_PACKET feature uses a socket to provide the means to do both packet capture and sending of raw packets. This means that developers who are used to using the BSD socket APIs are not faced with a steep learning curve, as they might if desiring to use DLPI on Open/Solaris. As Linux does not have an interface dedicated to packet capture, such as BPF, PF_PACKET has been extended to provide concepts such as ring buffers to improve its performance for this task. With the delivery of BPF and review of how this PF_PACKET interface is used, this project does not intend to deliver that part of PF_PACKET. Review of configure scripts from packages such as libpcap indicate that this is not likely to cause a problem for compiling 3rd party packags. Should this be a problem, attending to it in an RFE can be investigated. Interaction with other features =============================== Vanity Naming ~~~~~~~~~~~~~ Link names passed through to the PF_PACKET socket module are resolved using dls into link id's. snoop ~~~~~ This case has no interaction with snoop. libpcap ~~~~~~~ This case has no interaction with libpcap (PSARC/2008/288). libnet ~~~~~~ This case will update libnet in SFW (PSARC/2008/409) to use the PF_PACKET interfaces introduced by this project. Privilege ~~~~~~~~~ Both SOCK_RAW and SOCK_DGRAM PF_PACKET sockets require the "net_rawaccess" privilege be held by the caller in order to be created. Zones ~~~~~ Given that zones do not have the "net_rawaccess" privilege, they are thus unable to open PF_PACKET sockets, even if they have an exclusive instance of IP. Thus Linux branded zones will currently not be able to create PF_PACKET sockets. kstats ~~~~~~ A single set of statistics will be exported via kstats that cover the operation of the entire PF_PACKET socket module. The kstats will be installed under a new module name, "pfpacket" with the name "global" and class "misc". Although there are per-socket statistics available via the ioctl interface, these will not be present in kstats. The following statistics will be exported via kstats: ----------------------------------------------------- macHedaerFail - mac_header_info failed to analyse the packet badProtocol - protocol mismatch between the packet and its socket allocbFail - failed to allocate a mblk-t for sending a packet recvOk - successful delivery of the packet up to sockfs recvFail - failed to deliver the packet up to sockfs Interaction with network library calls ====================================== socket(3SOCKET) interaction ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Socket Family ------------- Creating a PF_PACKET socket will be achieved by calling socket(3SOCKET) with the address family set to PF_PACKET. The specific value for this protocol family will be mapped to the PF_PACKET socket module via the /etc/sock2path file. Two additions are required for sock2path, one for PF_PACKET/SOCK_DGRAM and one for PF_PACKET/SOCK_RAW. Socket Type ----------- The PF_PACKET socket supports two types of sockets: SOCK_RAW and SOCK_DGRAM. SOCK_RAW provides the application with a raw copy of the packet from the network interface with which it has been bound. When sending a packet with a SOCK_RAW PF_PACKET socket, the kernel will assume that the application has supplied the appropriate link layer header in the data to be sentout. SOCK_DGRAM provides a cooked view of packets, where the link layer header is removed from view of the application. When sending out packets using a PF_PACKET socket built with SOCK_DGRAM, it is not necessary to supply a MAC header. Older Linux kernels supported a different type of socket with PF_PACKET, SOCK_PACKET. Although Linux kernels today still support SOCK_PACKET, apart from a reduced code path length on receive and send, they offer no functional advantage: it remains just a stripped down SOCK_RAW packet. This project will not support SOCK_PACKET sockets. Socket Protocol --------------- The protocol argument to socket(3SOCKET) is used by PF_PACKET when sending packets and the caller has not supplied a 'struct sockaddr_ll' with the packet being sent. bind(3SOCKET) interaction ~~~~~~~~~~~~~~~~~~~~~~~~~ Applications that have been written to use PF_PACKET sockets expect to build sockaddr_ll structures for use with bind(3SOCKET). Open/Solaris does not currently contain a definition of this structure, so it needs to be introduced. Given the format of this structure is under the control of an external entity, it is not possible to classify its contents any higher than Volatile. struct sockaddr_ll ------------------ The expected use of this structure is as follows: sll_family - will always be AF_PACKET sll_protocol - MAC layer protocol number, e.g. 0x800 for IP over Ethernet sll_ifindex - network interface index from SIOCGIFINDEX sll_pkttype - indicates the type of the packet (see below) sll_hatype - Data Link Type (ATM, FDDI, Ethernet, etc) sll_halen - length of the hardware address stored at sll_addr sll_addr - hardware address setsockopt(3SOCKET) interaction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Three new options to setsockopt() are introduced with this case, each one being available at the SOL_PACKET level of the PF_PACKET socket. They are as follows: PACKET_ADD_MEMBERSHIP - option is a 'struct packet_mreq' PACKET_DROP_MEMBERSHIP - option is a 'struct packet_mreq' PACKET_AUXDATA - when enabled, allows recvmsg() and sendmsg() to be used on the socket to retrieve and send extra information. struct packet_mreq ------------------ mr_ifindex - network interface index from SIOCGIFINDEX mr_type - promiscuous mode to turn on/off mr_alen - length of hardware address at mr_address mr_address - link layer multicast address to bind to New SOL_SOCKET options ---------------------- Additionally, this case introduces two new options to SOL_SOCKET, SO_ATTACH_FILTER and SO_DETACH_FILTER. In implementing thse options, this case would like to set aside 0x40000000 - 0x7fffffff for assinging option numbers from. The existing options, such as SO_LOOPBACK, are all individual bits, despite them never being used together - i.e. code is always written to set each option individually. This reduces the number of options-as-flags by two but also provides a dramatic increase in available option numbers. SO_ATTACH_FILTER will be defined 0x40000001 and SO_DETACH_FILTER will be 0x40000002. 0x40000000 is a reserved value. SO_ATTACH_FILTER - attach a BPF based filter program to a PF_PACKET socket SO_DETACH_FILTER - remove any attached BPF filter program from a socket getsockopt(3SOCKET) interaction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Only one new option is added to getsockopt(), PACKET_STATISTICS, for the SOL_PACKET level of the PF_PACKET socket. The structure used with this socket option is "struct tpacket_stats" and it has the following fields: tp_packets - count of the packets passed the filter, including those dropped due to lack of buffer space. tp_drops - count of the packets dropped because of the lack of buffer space. This does not include any packets that are dropped by the mac layer or the NIC itself. ioctl(2) interaction ~~~~~~~~~~~~~~~~~~~~ Existing -------- All of the existing ioctls used by this case are commented as being "obsolete" in . For the purposes of Linux source code compatibility, even as obsolete, this interfaces are not allowed to be removed. These ioctls are marked as obsolete because their use has been retired in favour of a new bread that uses "struct lifreq" instead of the old "struct ifreq". The list of ioctls is: SIOCGIFINDEX SIOCGIFFLAGS SIOCSIFFLAGS SIOCGIFMTU Solaris ioctls -------------- In Solaris the above ioctls that are marked as obsolete (but left around for compatibility purposes) have been "replaced" by three new ioctls that use a "struct lifreq" instead of "struct ifreq". These ioctls will be supported by this case to support developers taking full advantage of our interfaces. The Solaris ioctls and the obsolete companion are: SIOCGLIFINDEX (for SIOCGIFINDEX) SIOCGLIFFLAGS (for SIOCGIFFLAGS) SIOCSLIFFLAGS (for SIOCSIFFLAGS) SIOCGLIFMTU (for SIOCGIFMTU) New --- Two new interfaces need to be added, SIOCGIFHWADDR and SIOCGSTAMP. SIOCGIFHWADDR retrieves the hardware inteface of the interface specified in the "struct ifreq" passed in. SIOCGSTAMP .... At present there are no plans for a SIOCGLIFHWADDR as this ioctl does not exist on Linux. This project will only support these two ioctls on PF_PACKET sockets. SIOCGIFHWADDR SIOCGSTAMP Volo ==== This project uses the socket upcall and downcall interfaces introduced by the Volo project. Imported Interfaces ~~~~~~~~~~~~~~~~~~~ +-------------------------------------+ | Interface name | +-------------------------------------+ | | +-------------------------------------+ | struct modlsockmod | | smod_reg_t | | sock_downcalls_t | | sock_lower_handle_t | | sock_upcalls_t | +-------------------------------------+ Crossbow ======== To implement the PF_PACKET socket module in the kernel requires importing various interfaces that are not yet Committed interfaces. Imported Interfaces ~~~~~~~~~~~~~~~~~~~ +-------------------------------------+ | Interface name | +-------------------------------------+ | mac_addr_len | | mac_client_open | | mac_client_close | | mac_close | | mac_header | | mac_multicast_add | | mac_multicast_remove | | mac_open_by_linkid | | mac_promisc_add | | mac_promisc_remove | | mac_sdu_get | | mac_tx | | mac_unicast_primary_get | +-------------------------------------+ | mac_client_handle_t | | mac_client_promisc_type_t | | mac_handle_t | +-------------------------------------+ | MAC_ADDRTYPE_MULTICAST | | MAC_ADDRTYPE_BROADCAST | | MAC_ADDRTYPE_UNICAST | | MAC_CLIENT_PROMISC_ALL | | MAC_CLIENT_PROMISC_FILTERED | | MAC_CLIENT_PROMISC_MULTI | | MAC_DROP_ON_NO_DESC | +-------------------------------------+ | | | | | | | | +-------------------------------------+ Exported Interfaces =================== +-------------------------------------------+-------------------------+ | Interface name | Status | +-------------------------------------------+-------------------------+ | /usr/kernel/socketmod/sockpfp | Project Private | | /usr/kernel/socketmod/amd64/sockpfp | Project Private | | /usr/kernel/socketmod/sparcv9/sockpfp | Project Private | +-------------------------------------------+-------------------------+ | | Committed | +-------------------------------------------+-------------------------+ | PF_PACKET | Committed | | SOL_PACKET | Committed | | SO_ATTACH_FILTER | Committed | | SO_DETACH_FILTER | Committed | | SIOCGIFHWADDR | Committed | | SIOCGSTAMP | Committed | | PACKET_STATISTICS | Committed | | PACKET_ADD_MEMBERSHIP | Committed | | PACKET_DROP_MEMBERSHIP | Committed | | PACKET_AUXDATA | Committed | | PACKET_HOST | Committed | | PACKET_BROADCAST | Committed | | PACKET_MULTICAST | Committed | | PACKET_OTHERHOST | Committed | | PACKET_OUTGOING | Committed | +-------------------------------------------+-------------------------+ | struct packet_mreq | Committed | | struct sock_filter | Committed | | struct sock_fprog | Committed | | struct sockaddr_ll | Committed | | struct tpacket_auxdata* | Committed | | struct tpacket_hdr* | Committed | | struct tpacket_stats | Committed | | struct tpacket2_hdr* | Committed | +-------------------------------------------+-------------------------+ * - the definitions for the structures that have been built by this project for compatibility with Linux have been done so using man pages, written descriptions of the interfaces used by Linux and source code in libpcap. In some cases uses of the structures was either hard to find and/or incomplete. Whilst the descriptions laid the groundwork for understanding the makeup, they didn't provide enough detail to guarantee 100% field anme alignment with Linux and thus may need to be corrected by a future bug/patch that improves the interface compatibility. sockaddr_ll ----------- struct sockaddr_ll { uint16_t sll_family; uint16_t sll_protocol; uint32_t sll_ifindex; uint16_t sll_pkttype; uint8_t sll_hatype; uint8_t sll_halen; uint8_t sll_addr[8]; }; The list of values expected to be used with sll_pkttype is: PACKET_HOST - received packet is unicast and destined for this host PACKET_BROADCAST - received packet is a broadcast packet PACKET_MULTICAST - received packet is a multicast packet PACKET_OTHERHOST - received packet is unicast and not destined for this host PACKET_OUTGOING - packet is being sent out by this system (no distinction is made between routed, bridged, or locally generated packets.) struct mreq ----------- struct packet_mreq { uint32_t mr_ifindex; uint16_t mr_type; uint16_t mr_alen; uint8_t mr_address[9]; }; The list of values expected to be used with mr_type is: PACKET_MR_ALLMULTI PACKET_MR_MULTICAST PACKET_MR_PROMISC typedef enum tpkt_status_e { TP_STATUS_KERNEL, TP_STATUS_USER, TP_STATUS_COPY, TP_STATUS_LOSING, TP_STATUS_CSUMNOTREADY } tpkt_status_t; struct tpacket_auxdata { tpkt_status_t tp_status; uint32_t tp_len; uint32_t tp_snaplen; uint16_t tp_macoff; uint16_t tp_netoff; uint16_t tp_vlan_vci; }; struct tpacket_hdr { uint64_t tp_status; uint32_t tp_len; uint32_t tp_snaplen; uint16_t tp_macoff; uint16_t tp_netoff; uint32_t tp_sec; uint32_t tp_usec; }; struct tpacket2_hdr { tpkt_status_t tp_status; uint32_t tp_len; uint32_t tp_snaplen; uint16_t tp_macoff; uint16_t tp_netoff; uint32_t tp_sec; uint32_t tp_nsec; uint16_t tp_vlan_tci; }; struct tpacket_stats { uint16_t tp_packets; uint16_t tp_drops; }; struct sock_filter { /* Fields named from bpf_insn */ uint16_t code; uint8_t jt; uint8_t jf; uint32_t k; }; struct sock_fprog { uint16_t len; struct sock_filter *filter; };