#ident "@(#)issues 1.2 08/10/15 SAC" Inception Review October 15, 2008: Roamer-1. When the material talks about current interface limitation, 4.1.2, why it's a problem to allow a driver to get more that *2* MSI-X? Those integrated device drivers should be prepared that it can not get any MSI-X interrupt vector, and it might try the legacy INTX instead. So it should not be a problem even all MSI-X vectors have been given to those attached drivers. Late-attached drivers will just use legacy INTX interrupts. The justification for current *hard-coded* limitation doesn't make sense. ANS: You're right. Drivers could revert to INTX (FIXED) interrupts if the system has run out of MSI-X vectors. The current limit of 2 vectors was derived based on the number of slots in current systems with PCIe (and thus MSI-X) support, and based upon restrictions of the interrupt vector space and priority assignments therein imposed by the current design for low-level interrupt management on x64 based systems. That was just the value that could safely be selected at that time due to those constraints. Roamer-2. How the IRM framework decide to decrease the number of interrupt vectors that have been given to a driver? 4.2.1 talk about how driver participate the IRM interfaces, but it's obscure how the framework can wisely move interrupt resources around drivers. ANS: The "Design & Implementation" specification provides more details, including pseudo code of the algorithms used. The implementation is purely mathematical, and employs the size of individual requests as a weighting factor when computing how many interrupts to give to each device. The goal is to take a set of requests and the total number of vectors in a pool, and compute the largest possible number of vectors that can be given to each device. Larger requests may be less fulfilled than smaller requests, and smaller requests may be totally fulfilled. The results of the calculations are always consistent whether the final I/O configuration (and set of requests) was the result of a series of hotplug operations, or whether it was the initial boottime configuration of the system. Roamer-3. How the IRM framework make *wise* decision about which driver can take more interrupt vectors than others? For example, when you have a 10GbE NIC and a 1GbE NIC in the box, both drivers ask for 16 vectors when you don't have enough vectors left. To give the same amount of interrupt vectors to two driver instances are unreasonable. As part of Crossbow project, hardware resources are allocated depending on the real link speed and bandwidth need. But as the low level I/O framework, IRM don't have knowledge about those information. How do you prove that your "management" is reasonable? ANS: One aspect of the design is that the algorithms implemented to do those computations are modular. Additional algorithms can be added, and then selected in the future to rebalance different pools based on different policies. All of the devinfo nodes associated with all of the requests in each pool are visible to these algorithms. So there is an opportunity to expand our repertoire of algorithms in the future to give different preferences to different types of devices, or make more elaborate policy decisions. The project only delivers two generic algorithms to begin with, but there is room to evolve the underlying implementation without changing the visible interfaces to drivers. Roamer-4. What's the perimeter of IRM? In a virtualized environment, interrupts might have been bound to CPUs in an exclusive zone or a guest domain, when IRM asks such interrupt vectors back from the driver, who will take care of the interrupt re-targeting? It's out of driver's control, and I can not find any relevant information from this document. Garrett-5. The interfaces are marked Committed. I have some concerns with this, as I read the project details. I'd feel a lot better if we had some more complete description of how this is used some typical device drivers, with some real experience with them, before raising the commitment. If the project team has some sample implementations that can use this, then I might change my position. But in the absence of that, I'd feel better with an Uncommitted binding while we get some experience with the APIs at hand. ANS: This is a good suggestion, and in fact the project team decides that a better commitment level for these interfaces is CONSOLIDATION PRIVATE. Our experience with the interfaces is limited. The Atlas/Neptune team is actively converting the nxge driver to this project's interfaces. And we have had technical consultations with HBA driver developers (working on QLogic and Emulex drivers), and the Crossbow Project team. Input from these other teams has been taken into consideration in our project. But certainly as more practical experience is gained, we may evolve our interfaces. The CONSOLIDATION PRIVATE commitment level will allow us to manage the changes to the interfaces by only having to effect consumers in the ON Consolidation as the interfaces evolve. A more detailed example of how to use these interfaces will be written up for future inclusion in the WDD, and to improve the example that is already in the project's manpages. These additional examples are in the case materials. The example used is derived from the real world example of how the Atlas/Neptune driver will utilize the interfaces, but slightly generalized to represent an idealized, non-specific kind of hypothetical driver. Garrett-6. How is the default number of interrupts that will be allocated to a non-IRM driver determined? ANS: As previously answered about how the current default value of 2 was derived, the default number is based on what is appropriate for the platform. In existing platforms with existing PCIe nexus drivers, the current value is what is appropriate. In this project, the default value becomes a property of the interrupt pool when it is created by a nexus driver. That value will always be what is appropriate based on the platform when a nexus driver creates its interrupt pools. The value will increase for future platforms on which it makes sense to do so. Durrant-7. I can't find any mention in the specification of an interface for a device driver to discover or control interrupt CPU binding. In general, for a driver with high data throughput, multiple interrupts on the same CPU are pointless at best and in many cases harmful to performance; in fact, multiple interrupts using the same CPU core, cache or even chip/package can be harmful since they may cause needless CPU contention. So, if a device driver is being given an interface to request more interrupts then that interface really should allow some control of which CPUs those interrupts are bound to whether this is specific or by specification of a policy (e.g. one-per-cpu, one-per-cache, one-per-chip, etc.). Also, for interrupts allocated using the existing ddi_intr_alloc() command there really should be a means to discover the CPU binding unless this call can be superceded by a call that also gives control over CPU binding for that initial allocation. ANS: The scope of this project is just to make the number of available interrupts given to each driver instance a dynamically managed value. A driver may take the current number of available interrupts and other factors into considering during its attach(9F) routine before it makes a decision on how many interrupt resources it should actually allocate and how it will setup its handlers. What this project really boils down to is notifying the driver through a callback mechanism when the number of available interrupts has changed, at which time it can revisit those original decisions. We still depend on the drivers to request whatever number of interrupt resources is appropriate as they already do today without this project. Binding interrupt vectors to specific processors is a low level function beyond the scope of this project, best handled by platform specific nexus drivers or managed long term by something like intrd.