Solaris Open Fabrics Architecture ================================== 1.0 Overview ============= The Solaris Open Fabrics User Verbs (OFUV) project provides Open Fabrics (OF) user-land and functionality on Solaris. This is provided by a) ported OF libraries (libibverbs, libmthca, libmlx4, libibcommon & librdmacm) and b) new solaris kernel modules that interface into the libibverbs & librdmacm libraries using the OF defined user/kernel interface and translate OF API calls into IBTF calls. The Solaris Open Fabrics Architecture also provides support for OF kernel RDMA CM & OF kernel verbs APIs (Initially only those required by RDSv3). These kernel APIS are provided by the 'sol_ofs' (Solaris OF Services) kernel module. The Solaris OF architecture will be delivered in two phases, phase 1 is derived from the existing kernel modules and interfaces that were developed for the OFUV project, with 'sol_ofs' being extended to add kernel verbs support. Phase 2 will see some internal rearchitecture ('sol_uverbs' modified to become a client of 'sol_ofs') and the addition of 'sol_umad'. Phase 1 ------- +----------+ +----------+ |librdmacm | |libibverbs| +----+-----+ +----+-----+ Userland | | ---------------------------+------------------------+ Kernel | | | | +------------------+ +---+----+ <(1)> +----+-------+ | OF kernel ULP | |sol_ucma+--------------+ sol_uverbs + +---+--------------+ +---+----+ +---------+--+ | || | (2) | +-----------+ | |ib_verbs.h | | +---+------------------------------+ | | sol_ofs |<- | +-----------------+----------------+ | | | | | +-----------------+--------------------------------------+-------+ | IBTF | +----------------------------------------------------------------+ Legend : (1) - sol_uverbs2ucma.h (2) - This is the OFED defined interface with a single Solaris specific addition of a pointer to a dev_info_t added to the struct ib_client. Notes ------- 1. 'sol_ucma' provides interface to user level RDMA-CM library. It calls 'sol_ofs' misc module for CM functionality. 2. 'sol_uverbs' provides interface to user level verbs library. 4. 'sol_ofs' is a misc module that provides services defined in 'rdma_cm.h' and ib_verbs.h. It also exports utility functions for link lists, user object resource management and debug logging. 'sol_ofs' module contains a generic sol_cma.c and transport specific sol_ib_cma.c and sol_iw_cma.c (future) source files. The interface between the generic and the transport specific portion of sol_cma, is project private. 5. 'sol_ofs' acts as a "proxy" IBTF client for 'sol_ofs' clients (RDSv3 etc). and 'sol_uverbs' are IBTF clients. Each time a 'sol_ofs' client registers with 'sol_ofs' (ib_register_client) 'sol_ofs' will register that client with IBTF. However 'sol_ofs' specifies the 'sol_ofs' IBTF async handler as the async callback on IBTF registration, so all IBTF callbacks are caught and handled by 'sol_ofs' (more details in the section 1.1.1 In-Kernel Verbs API). Also 'sol_ofs' stores the returned ibt_clnt_hdl_t from IBTF in a per 'sol_ofs' client structure (sol_ofs_client_t), the handle is not returned to 'sol_ofs' clients. This means that all 'sol_ofs' clients are children of ibnex, and clients of IBTF. However they do not consume any IBTF callbacks, they are consumed by 'sol_ofs' on their behalf, and translated in to OFED kernel verbs KPI callbacks. Phase 2 ------- +----------+ +----------+ +-----------+ |librdmacm | |libibverbs| | libibumad | +----+-----+ +----+-----+ +-----+-----+ Userland | | | ---------------------------+------------------------+------------------+----- Kernel | | | <(2)> | | | +------------------+ +---+----+ <(1)> +----+-------+ +----+-----+ | OF kernel ULP | |sol_ucma+--------------+ sol_uverbs + + sol_umad | +---+--------------+ +---+----+ +-------+----+ +-+----+---+ | || | | | | +-----------+ | ib_verbs.h | | | | | (3) | | (3) |ib_verbs.h | | | | +---+-----------------------+--------------------------+------------+--+ | | sol_ofs | | +------------------------------+---------------------------------------+ | | | | | +---------------------+------------------+ +----------+ | IBTF | | IBMF | +----------------------------------------+ +----------+ Legend : (1) - sol_uverbs2ucma.h (2) - ib_user_mad.h (3) - This is the OFED defined interface with a single Solaris specific addition of a pointer to a dev_info_t added to the struct ib_client. Notes ------ 1. 'sol_uverbs' uses the kernel verbs interface. 2. 'sol_umad' only uses the device query verbs. 1.1 APIs -------- 1.1.1 In-Kernel Verbs API This will be the OF 'ib_verbs.h' but some OF structs will have Solaris extensions added for use by 'sol_ofs'. Initial details: struct ib_client { char *name; void (*add) (struct ib_device *); void (*remove)(struct ib_device *); dev_info_t *dip /* IBTF */ } struct ib_device { ib_guid_t device_guid; ibt_hca_hdl_t device_hca_hdl; /* IBTF */* char name[IB_DEVICE_NAME_MAX]; u8 node_type; }; struct ib_pd { struct ib_device *pd_devicep; ibt_pd_hdl_t *pd_ibt_pd_hdl; /* IBTF */ }; struct ib_cq { struct ib_device *device; ib_comp_handler comp_handler; void (*event_handler)(struct ib_event *, void *); void * cq_context; int cqe; ibt_cq_hdl_t ibt_cq; /* IBTF */ }; struct ib_qp_init_attr { void (*event_handler)(struct ib_event *, void *); void *qp_context; struct ib_cq *send_cq; struct ib_cq *recv_cq; struct ib_srq *srq; struct ib_qp_cap cap; enum ib_sig_type sq_sig_type; enum ib_qp_type qp_type; uint8_t port_num; /* special QP types only */ }; struct ib_qp { struct ib_devic *qp_dev; struct ib_pd *pd; struct ib_cq *send_cq; struct ib_cq *recv_cq; struct ib_srq *srq; void (*event_handler)(struct ib_event *, void *); void *qp_context; uint32_t qp_num; enum ib_qp_type qp_type; ibt_qp_hdl_t *qp_ibt_qp; /* IBTF */ }; The OF ib_register_client() routine is implemented in 'sol_ofs'. ib_register_client() would allocate an sol_ofs_client_t struct, defined as: struct sol_ofs_client_s { struct ib_client *ofs_of_client; ibt_clnt_modinfo_t ofs_ibtf_client ibt_clnt_hdl_t ofs_ibt_hdl; /* other implementation members, list links etc. */ } sol_ofs_client_t; 'ofs_of_client' is set to point to the 'client' arg of ib_register_client() The ibt_clnt_modinfo_t fields are fill out as follows: mi_ibt_version = IBTI_V3; mi_clnt_class = IBT_GENERIC mi_async_handler = The 'sol_ofs_async handler. mi_clnt_name = 'client.name' 'sol_ofs' mainatain a list of sol_ofs_client_t for each 'sol_ofs' client that registers with it. ib_register_client() would call ibt_attach() to register the 'sol_ofs' client (eg RDS etc) with IBTF. And specify the "clnt_private" pointer in the ibt_attach call to point to the address of the newly allocated sol_ofs_client_t. This is returned to 'sol_ofs' when IBTF calls the 'sol_ofs' async handler, 'sol_ofs' can then access the relevent 'sol_ofs' client 'add'/'remove' struct ib_client callback for IBTF HCA DR events. After ib_register_client() calls ibt_attach(), it calls ibt_get_hca_list() and for each HCA found allocates an 'ib_device' struct, calls ibt_open_hca() to get an ibt_hca_hdl, and puts the ibt_hca_hdl and the HCA guid in the 'ib_device' struct ('device_hca_hdl' and 'device_guid'). 'sol_ofs' also calls ibt_set_hca_private() to set the address of the newly allocated 'ib_device' struct as the private data of the ibt_hca_hdl_t. 'sol_ofs' then calls the passed in ULP 'add' callback (member of the 'ib_client' struct passed to ib_register_client())), passing the newly created ib_device struct as an argument. For each OF ib_* call 'sol_ofs' calls the corresponding IBTF function and stores the ibt handle in the OF object structure. For example the 'sol_ofs' implementation of ib_create_qp() first allocates an 'ib_qp' struct and then calls ibt_alloc_qp(), and stores the ibt_qp_hdl in the 'qp_ibt_qp' of the 'ib_qp' struct. 'sol_ofs' also sets the ibt_qp_hdl private data to the address of the allocated 'ib_qp' using ibt_set_qp_private() function. The async event handler passed into 'sol_ofs' via the 'ib_qp_init_attr' and is stored in the 'event_handler' member of the 'ib_qp' struct. When 'sol_ofs' gets an IBTF async on an ibt_qp_hdl, 'sol_ofs' uses ibt_get_qp_private() to get the associated 'struct ib_qp', and from that the 'event_handler' passed into 'sol_ofs' when the QP was created, 'sol_ofs' then calls that handler. For completion events, the completion event handler is passed into 'sol_ofs' in the ib_create_cq() call as follows: struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void (*event_handler)(struct ib_event *, void *), void *cq_context, int cqe, int comp_vector) 'sol_ofs' allocates an 'ib_cq' struct, calls ibt_alloc_cq(), and stores the ibt_cq_handle in the 'ibt_cq' member of the ib_cq_struct. It then sets the ibt_cq_hdl private data to the address of the 'ib_cq' struct using ibt_set_cq_private. Both the passed in async handler ('event_handler') and completion handler ('comp_handler') are also stored in the 'ib_cq' struct. 'sol_ofs' calls ibt_set_cq_handler() to set up the IBTF CQ handler to THE 'sol_ofs' IBTF CQ handler. When the 'sol_ofs' CQ handler is called it is passed the ibt_cq_hdl, 'sol_ofs' calls ibt_get_cq_private() to get the 'struct ib_cq', and from that gets the 'comp_handler' passed in on CQ creation, and calls it. The sol_ofs module converts IBTF async events to OF async events as the following table, then invokes an appropriate event handler by getting the private data from the corresponding handle in async_event_t. The other IBTF async events (except for IBT_ATTACH_EVENT and IBT_DETACH_EVENT) should not be propagated to sol_ofs clients, but handled in sol_ofs internally instead. In case of IBT_ATTACH_EVENT, sol_ofs calls client->add(device) by using sol_ofs_client passed as the client private data. In case of IBT_DETACH_EVENT, sol_ofs calls client->remove(device) instead. sol_ofs clients should deal with these callbacks appropriately. +-------------------------------+-------------------------------+-------------+ | IBTF async events | OF async events |event handler| +-------------------------------+-------------------------------+-------------+ |IBT_EVENT_PATH_MIGRATED |IB_EVENT_PATH_MIG | QP | |IBT_EVENT_SQD |IB_EVENT_SQ_DRAINED | QP | |IBT_EVENT_COM_EST |IB_EVENT_COMM_EST | QP | |IBT_ERROR_CATASTROPHIC_CHAN |IB_EVENT_QP_FATAL | QP | |IBT_ERROR_INVALID_REQUEST_CHAN |IB_EVENT_QP_REQ_ERR | QP | |IBT_ERROR_ACCESS_VIOLATION_CHAN|IB_EVENT_QP_ACCESS_ERR | QP | |IBT_ERROR_PATH_MIGRATE_REQ |IB_EVENT_PATH_MIG_ERR | QP | |IBT_ERROR_CQ |IB_EVENT_CQ_ERR | CQ | |IBT_EVENT_LIMIT_REACHED_SRQ |IB_EVENT_SRQ_LIMIT_REACHED | SRQ | |IBT_EVENT_EMPTY_CHAN |IB_EVENT_QP_LAST_WQE_REACHED | QP | |IBT_ERROR_CATASTROPHIC_SRQ |IB_EVENT_SRQ_ERR | SRQ | +-------------------------------+-------------------------------+-------------+ 1.1.2 In-kernel RDMA-CM API This is defined by the header file: /usr/include/rdma/rdma_cm.h In phase 1 it contains the following sol_ofs project private extensions for use by sol_ucma olny: i. rdma_map_id2clnthdl(struct rdma_cm_id *, void *ib_client_hdl, void *iw_client_hdl) ii. rdma_map_id2qphdl(struct rdma_cm_id *, void *qp_hdl); These won't be necessary when phase 2 is implemented and 'sol_uverbs' is a 'sol_ofs' client and allocates a QP using ib_create_qp() and does not call directly into IBTF. 1.1.3 sol_uverbs sol_uverbs exports provides the following functions : sol_uverbs_get_clnt_hdl(); sol_uverbs_uqpid2qphdl(); sol_uverbs_disable_uqp_modify(); 'sol_uverbs2ucma.h' contains the function pointer definations for the above functions. This header file enables 'sol_ucma' to access the 'sol_uverbs' functions using ddi_modopen(), ddi_modsym() & ddi_modclose() The above interfaces are project private interface between 'sol_ucma' and the 'sol_uverbs' driver, 1.1.4 sol_ofs_common.h a. Interfaces for generic linked list management APIs b. Interfaces for sol_ofs_dprintf_l*() debug routines c. Interfaces for User objects. All 'sol_ofs' clients can use these interfaces. 2.0 Loading of modules ====================== 1. All drivers pre-load the sol_ofs using : ld -N misc/sol_ofs The drivers are : sol_ucma, sol_uverbs & sol_umad. 2. sol_ucma loads sol_uverbs (if sol_uverbs has not been loaded yet) using ddi_modload() and friends.