Appendix E Interrupt Resource Management ---------------------------------------- This appendix discusses how a driver for a device that can generate many different interruptible conditions can utilize the Interrupt Resource Management feature to optimize its allocation of interrupt vectors. The Interrupt Resource Management Feature ----------------------------------------- The Interrupt Resource Management feature is not available on every Solaris platform. And the feature is only available to PCIe devices that utilize MSI-X interrupts. A driver that utilizes this feature must be able to adapt correctly when the feature is not available. When the feature is available, the driver can achieve higher performance by gaining access to more interrupt vectors. With more interrupt vectors on a multi-processor system, more interruptible conditions can then be serviced in parallel. In general, the feature allows drivers to specify how many interrupt vectors they want to use. The feature will attempt to give as many interrupt vectors as requested within certain constraints. * Total number available - A finite number of interrupt vectors exists in the system. * Total number requested - The total number available in the system is shared by many drivers. The feature determines a fair amount to give each driver. The number of interrupt vectors made available to a device at any given time can vary as other devices are dynamically added or removed from the system. Or as drivers dynamically change how many interrupt vectors they want to use in response to load. To utilize the feature, a driver must be enhanced to support it. * Callback Support - Drivers must register a callback handler so they can be notified when their number of available interrupts has been changed by the system. They must be able to increase or decrease their interrupt usage. * Interrupt Requests - Drivers must specify how many interrupts they want to use. * Interrupt Usage - Drivers must make the right decisions about how many interrupts they should request at any given time, based on what interruptible conditions their hardware can generate and how many processors can be used to process those conditions in parallel. * Interrupt Flexibility - Drivers must be flexible enough to assign one or more interruptible conditions to each interrupt vector in a manner that best fits their current number of available interrupts. And they may need to reconfigure these assignments when their number of available interrupts is increased or decreased at arbitrary times. Callback Interfaces ------------------- The interfaces listed in the following table can be used by a driver to register its callback support: +---------------------+----------------------+--------------------------------+ | Interface | Data Structures | Description | |---------------------+----------------------+--------------------------------| | ddi_cb_register() | ddi_cb_flags_t, | Register a callback handler | | | ddi_cb_handle_t | function to receive specific | | | | types of actions. | |---------------------+----------------------+--------------------------------| | ddi_cb_unregister() | ddi_cb_handle_t | Unregister a callback handler. | |---------------------+----------------------+--------------------------------| | (*ddi_cb_func_t)() | ddi_cb_action_t | Receives callback actions and | | | | specific arguments relevant to | | | | each action to be processed. | +---------------------+----------------------+--------------------------------+ ddi_cb_register() ----------------- The ddi_cb_register() function registers a callback handler function for the driver. The interface for the function is as follows: int ddi_cb_register(dev_info_t *dip, ddi_cb_flags_t cbflags, ddi_cb_func_t cbfunc, void *arg1, void *arg2, ddi_cb_handle_t *ret_hdlp); The driver can only register one callback function which will be used to handle all individual callback actions. The cbflags parameter determines which types of actions should be received by the driver when they occur. The cbfunc routine will be called whenever a relevant action should be processed by the driver. The driver specifies two private argument (arg1, arg2) to send to itself during each execution of its cbfunc routine. The cbflags parameter is an enumerated type specifying which actions the driver supports. It is defined as follows: typedef enum { DDI_CB_FLAG_INTR } ddi_cb_flags_t; To register support for Interrupt Resource Management actions, a driver must register a handler including the DDI_CB_FLAG_INTR flag. Upon successful registration, an opaque handle will be returned through the ret_hdlp parameter for later unregistration of the callback handler. Registration of a callback handler should occur as an independent action during the driver's attach(9F) routine, and the opaque handle should be saved in the driver's soft state for later unregistration during detach(9F). ddi_cb_unregister() ------------------- The ddi_cb_unregister() function unregisters a callback handler for the driver. The interface for the function is as follows: int ddi_cb_unregister(ddi_cb_handle_t hdl); The driver will no longer receive callback actions once it unregisters its handler. It will also lose any additional support from the system that it previously gained by virtue of having a registered callback handling function. In the case of Interrupt Resource Management, this means that some interrupt vectors previously made available to the driver will be immediately taken back when the driver unregisters its callback handling function. The function will not return successfully until it first notifies the driver of any final actions that result from losing this support from the driver. The callback handling function must be able to perform correctly for the entire duration in which it is registered. It cannot depend upon any data structures that are destroyed before it is successfully unregistered. (*ddi_cb_func_t)() ------------------ The callback handling function that a driver registers is as follows: typedef int (*ddi_cb_func_t)(dev_info_t *dip, ddi_cb_action_t cbaction, void *cbarg, void *arg1, void *arg2); The cbaction parameter specifies what action the driver is receiving a callback to process. It is defined as follows: typedef enum { DDI_CB_INTR_ADD, DDI_CB_INTR_REMOVE } ddi_cb_action_t; A DDI_CB_INTR_ADD action means that the driver now has more interrupts that are available, and a DDI_CB_INTR_REMOVE action means that the driver now has less interrupts that are available. For both actions, the cbarg parameter should be casted to an (int) to determine the number of interrupts added or removed. The cbarg value represents a delta of how many more or how many less are now available. A callback handling function should return DDI_SUCCESS if it correctly handled the action, DDI_FAILURE if it encountered an internal error, or DDI_ENOTSUP if it received an unrecognized action. Interrupt Request Interfaces ---------------------------- A driver has two interfaces for requesting interrupt vectors from the system. They are as follows: +---------------------+----------------------+--------------------------------+ | Interface | Data Structures | Description | |---------------------+----------------------+--------------------------------| | ddi_intr_alloc() | ddi_intr_handle_t | Allocate interrupts. | |---------------------+----------------------+--------------------------------| | ddi_intr_set_nreq() | | Set interrupt request number. | +---------------------+----------------------+--------------------------------+ ddi_intr_alloc() ---------------- The ddi_intr_alloc(9F) function is as follows: int ddi_intr_alloc(dev_info_t *dip, ddi_intr_handle_t *h_array, int type, int inum, int count, int *actualp, int behavior); Before calling this function, a driver allocates an empty handle array large enough to contain the number of interrupts requested. The function will try to allocate up to count number of interrupt handles, and initialize the array with the assigned interrupt vectors beginning at the offset specified by the inum parameter. The actual number of interrupt vectors allocated is returned to the driver through the actualp parameter. The ddi_intr_alloc(9F) function can be used multiple times by a driver to allocate interrupt vectors to individual members of the interrupt handle array in separate steps. Or it can be used in one single call to try and allocate all of the interrupt vectors for the device at once. When the Interrupt Resource Management feature is being used, the driver should make one initial call to ddi_intr_alloc(9F) to try and allocate the full number of desired interrupt vectors at once. The count parameter will become the total number of interrupt vectors requested by the driver. The system may not be able to fulfill the request completely, in which case the actual number that are allocated may be less than the requested count. Subsequent calls to the ddi_intr_alloc(9F) function will have no effect on the total number requested upon the driver's first call to the function. ddi_intr_set_nreq() ------------------- The ddi_intr_set_nreq(9F) function is as follows: int ddi_intr_set_nreq(dev_info_t *dip, int nreq); When the Interrupt Resource Management feature is available, a driver could dynamically adjust its total number of requested interrupt vectors. It may choose to do so in response to the actual load it experiences once it is attached. At any time after it is specified its initial request through ddi_intr_alloc(9F) it can call ddi_intr_set_nreq(9F) to change its request size. The specified nreq value will go into effect as the driver's new total number of requested interrupt vectors. The Interrupt Resource Management feature may rebalance the available number of interrupts given to each driver in the system as a direct response to this new value. The driver might then receive more or less available interrupts as a side effect of this change. A driver might dynamically adjust its total number of requested interrupt vectors if, for example, it uses interrupts in conjunction with specific transactions that it is processing. A storage driver might associate a DMA engine with each ongoing transaction, thus requiring interrupt vectors for that reason. So calls to ddi_intr_set_nreq(9F) might be reasonable within a driver's open(9F) and close(9F) routines to scale its interrupt usage in response to actual use of the driver. Interrupt Usage and Flexibility ------------------------------- A driver for a device that supports many different interruptible conditions must be prepared to map those conditions to an arbitrary number of interrupt vectors, based on what is currently available from the system. The driver cannot assume that once interrupts are made available, they will always be available. Rather, some currently available interrupts may later be taken back by the system to accomodate the needs of other drivers in the system. A driver must be able to determine how many interrupts its hardware supports, determine how many it is appropriate for the driver to actually use (such as in relation to the total number of processors in the system, for example), and compare that with how many interrupts are actually available at any given time. Factoring all of this together, the driver must then be able to select an appropriate mixture of interrupt handling functions and program its hardware to generate interrupts accordingly. In some cases multiple interrupts will be targetted to the same vector, and the interrupt handler for that interrupt vector must then be able to determine which of many possible interrupts in fact happened. And in some other cases only one interrupt will be targetted by itself to a single vector, and the interrupt handler for that interrupt vector can then be more simple and efficient. Determining which interrupts to map to which interrupt vectors in an efficient manner ultimately effects the performance of that device in the system. Example ------- Consider a network device driver as an example. The network device hardware supports multiple transmit and receive channels. Whenever the device receives a packet on one of its receive channels, and whenever the device transmits a packet on one of its transmit channels, it can generate a unique interrupt condition for that event. The hardware is programmable such that it can send a specific MSI-X interrupt for each event that can occur. A table exists in the hardware which specifies which MSI-X interrupt to generate for each event. To optimize performance, the driver will request enough interrupts from the system to give each separate interrupt its own interrupt vector. It will make this request when it first calls ddi_intr_alloc(9F) during its attach(9F) routine. The driver will then consider the actual number of interrupts it received, as indicated by the result of that initial call to ddi_intr_alloc(9F). It may receive all the interrupts it requested, or it may receive less. A separate function inside the driver will take the total number of available interrupts as a parameter, and use that as an input to calculate which MSI-X interrupts to generate for each event. This function will program the table in the hardware accordingly. When the driver received all of its requested interrupt vectors, each entry in the hardware table will have its own unique MSI-X interrupt and there will be a direct 1-to-1 mapping from interrupt conditions to interrupt vectors. But when the driver has fewer interrupt vectors available, some MSI-X interrupt numbers will appear multiple times in that hardware table. That is, sometimes the hardware will end up generating the same MSI-X interrupt for more than one type of event. Furthermore, the driver will have two different kinds of interrupt handler functions. One function can be very simple and perform a specific task in response to an interrupt. This simple function is best to handle interrupts that are only generated by one of the possible hardware events. A second type of function is more complicated, and is used when multiple interrupts are mapped to the same MSI-X interrupt vector. The driver has a function named xx_setup_interrupts() which takes the number of available interrupt vectors, programs the hardware accordingly, and makes decisions about which interrupt handler will be used (with which arguments) for each of those interrupt vectors. The function looks as follows: int xx_setup_interrupts(xx_state_t *statep, int navail, xx_intrs_t *xx_intrs_p); This function is called with an array of xx_intrs_t data structures which look as follows: typedef struct { ddi_intr_handler_t inthandler; void *arg1; void *arg2; } xx_intrs_t; The function will make all of its decisions and program the hardware as is appropriate. The driver will then use the results of that call to know which interrupt handler functions, which which arguments, should be used to setup each of its interrupt handles. An existing driver that doesn't already use the Interrupt Resource Management feature would probably already have such a function because it must already adapt to receiving fewer interrupts than requested when it attaches. So this is existing functionality that is leveraged when modifying a driver to use the Interrupt Resource Management feature. Other existing functionality includes an ability to quiesce the hardware and resume the hardware, as would be used during certain events related to power management or hotplugging. Suppose this driver already has two other existing functions for these purposes as follows: int xx_quiesce(xx_state_t *statep); int xx_resume(xx_state_t *statep); Putting all of this together now, to enhance this device driver to use the Interrupt Resource Management feature, it needs to do the following: * Register Callback Handler - The driver must register for the actions that indicate when it has fewer or more available interrupts. * Callback Handling - The driver must quiesce its hardware, reprogram its interrupt handling, and resume its hardware in response to each such callback action. Example Implementation ---------------------- /* * attach(9F) routine. * * Creates soft state, registers callback handler, initializes * hardware, and sets up interrupt handling for the driver. */ xx_attach(dev_info_t *dip, ddi_attach_cmd_t cmd) { xx_state_t *statep = NULL; xx_intr_t *intrs = NULL; ddi_intr_handle_t *hdls; ddi_cb_handle_t cb_hdl; int instance; int type; int types; int nintrs; int nactual; int inum; /* Get device instance */ instance = ddi_get_instance(dip); switch (cmd) { case DDI_ATTACH: /* Get soft state */ if (ddi_soft_state_zalloc(state_list, instance) != 0) return (DDI_FAILURE); statep = ddi_get_soft_state(state_list, instance); ddi_set_driver_private(dip, (caddr_t)statep); statep->dip = dip; /* Initialize hardware */ xx_initialize(statep); /* Register callback handler */ if (ddi_cb_register(dip, DDI_CB_FLAG_INTR, xx_cbfunc, statep, NULL, &cb_hdl) != 0) { ddi_soft_state_free(state_list, instance); return (DDI_FAILURE); } statep->cb_hdl = cb_hdl; /* Select interrupt type */ ddi_intr_get_supported_types(dip, &types); if (types & DDI_INTR_TYPE_MSIX) { type = DDI_INTR_TYPE_MSIX; } else if (types & DDI_INTR_TYPE_MSI) { type = DDI_INTR_TYPE_MSI; } else { type = DDI_INTR_TYPE_FIXED; } statep->type = type; /* Get number of supported interrupts */ ddi_intr_get_nintrs(dip, type, &nintrs); /* Allocate interrupt handle array */ hdls = kmem_zalloc(nintrs * sizeof (ddi_intr_handle_t), KMEM_SLEEP); statep->hdls = hdls; /* Allocate interrupt setup array */ intrs = kmem_zalloc(nintrs * sizeof (xx_intr_t), KMEM_SLEEP); statep->intrs = intrs; /* Allocate interrupt vectors */ ddi_intr_alloc(dip, hdls, type, 0, nintrs, &nactual, 0); statep->nactual = nactual; /* Configure interrupt handling */ xx_setup_interrupts(statep, nactual, statep->intrs); /* Install and enable interrupt handlers */ for (inum = 0; inum < nactual; inum++) { ddi_intr_add_handler(&hdls[inum], intrs[inum].inthandler, intrs[inum].arg1, intrs[inum].arg2); ddi_intr_enable(hdls[inum]); } break; case DDI_RESUME: /* Get soft state */ statep = ddi_get_soft_state(state_list, instance); if (statep == NULL) return (DDI_FAILURE); /* Resume hardware */ xx_resume(statep); break; } return (DDI_SUCESS); } /* * detach(9F) routine. * * Stops the hardware, disables interrupt handling, unregisters * a callback handler, and destroys the soft state for the driver. */ xx_detach(dev_info_t *dip, ddi_detach_cmd_t cmd) { xx_state_t *statep = NULL; int instance; int inum; /* Get device instance */ instance = ddi_get_instance(dip); switch (cmd) { case DDI_DETACH: /* Get soft state */ statep = ddi_get_soft_state(state_list, instance); if (statep == NULL) return (DDI_FAILURE); /* Stop device */ xx_uninitialize(statep); /* Disable interrupts */ for (inum = 0; inum < statep->nactual; inum++) { ddi_intr_disable(statep->hdls[inum]); ddi_intr_remove_handler(statep->hdls[inum]); } /* Unregister callback handler */ ddi_cb_unregister(statep->cb_hdl); /* Free soft state */ ddi_soft_state_free(state_list, instance); break; case DDI_SUSPEND: /* Get soft state */ statep = ddi_get_soft_state(state_list, instance); if (statep == NULL) return (DDI_FAILURE); /* Suspend hardware */ xx_quiesce(statep); break; } return (DDI_SUCCESS); } /* * (*ddi_cbfunc)() routine. * * Adapt interrupt usage when availability changes. */ int xx_cbfunc(dev_info_t *dip, ddi_cb_action_t cbaction, void *cbarg, void *arg1, void *arg2) { xx_state_t *statep = (xx_state_t *)arg1; int count; int inum; int nactual; switch (cbaction) { case DDI_CB_INTR_ADD: case DDI_CB_INTR_REMOVE: /* Get change in availability */ count = (int)(uintptr_t)cbarg; /* Suspend hardware */ xx_quiesce(statep); /* Tear down previous interrupt handling */ for (inum = 0; inum < statep->nactual; inum++) { ddi_intr_disable(statep->hdls[inum]); ddi_intr_remove_handler(statep->hdls[inum]); } /* Adjust interrupt vector allocations */ if (cbaction == DDI_CB_INTR_ADD) { /* Allocate additional interrupt vectors */ ddi_intr_alloc(dip, statep->hdls, statep->type, statep->nactual, count, &nactual, 0); /* Update actual count of available interrupts */ statep->nactual += nactual; } else { /* Free removed interrupt vectors */ for (inum = statep->nactual - count; inum < statep->nactual; inum++) { ddi_intr_free(statep->hdls[inum]); } /* Update actual count of available interrupts */ statep->nactual -= count; } /* Configure interrupt handling */ xx_setup_interrupts(statep, statep->nactual, statep->intrs); /* Install and enable interrupt handlers */ for (inum = 0; inum < nactual; inum++) { ddi_intr_add_handler(&hdls[inum], intrs[inum].inthandler, intrs[inum].arg1, intrs[inum].arg2); ddi_intr_enable(hdls[inum]); } /* Resume hardware */ xx_resume(statep); break; default: return (DDI_ENOTSUP); } return (DDI_SUCCESS); }