On Thu, 2019-11-07 at 03:06 +0900, Keith Busch wrote: > On Wed, Nov 06, 2019 at 04:40:07AM -0700, Jon Derrick wrote: > > In order to provide better affinity alignment along the entire storage > > stack, VMD IRQ lists can be assigned to in a manner where the underlying > > IRQ can be affinitized the same as the child (NVMe) device. > > > > This patch changes the assignment of child device vectors in IRQ lists > > from a round-robin strategy to a matching-entry strategy. NVMe > > affinities are deterministic in a VMD domain when these devices have the > > same vector count as limited by the VMD MSI domain or cpu count. When > > one or more child devices are attached on a VMD domain, this patch > > aligns the NVMe submission-side affinity with the VMD completion-side > > affinity as it completes through the VMD IRQ list. > > This really only works if the child devices have the same irq count as > the vmd device. If the vmd device has more interrupts than the child > devices, this will end up overloading the lower vmd interrupts without > even using the higher ones. Correct. The child NVMe device would need to have the same or more than the 32 IO vectors VMD offers. We could do something dynamically to determine when to do matching-affinities vs round-robin, but as this is a hotpluggable domain it seems fragile to be changing interrupts in such a way. I haven't actually seen an NVMe device with fewer than 32 vectors, and overloading VMD vectors seems to be the least of the concerns of performance with such a device. This configuration will result in what is essentially the same issue we're facing today with poorly affined VMD IRQ lists. For the future VMD implementation offering 63 IO vectors, yes this will be a concern and all I can really suggest is to use drives with more vectors until I can determine a good way to handle this.