On Fri, 31 Aug 2018, Kashyap Desai wrote: > > Ok. I misunderstood the whole thing a bit. So your real issue is that you > > want to have reply queues which are instantaneous, the per cpu ones, and > > then the extra 16 which do batching and are shared over a set of CPUs, > > right? > > Yes that is correct. Extra 16 or whatever should be shared over set of > CPUs of *local* numa node of the PCI device. Why restricting it to the local NUMA node of the device? That doesn't really make sense if you queue lots of requests from CPUs on a different node. Why don't you spread these extra interrupts accross all nodes and keep the locality for the request/reply? That also would allow to make them properly managed interrupts as you could shutdown the per node batching interrupts when all CPUs of that node are offlined and you'd avoid the whole affinity hint irq balancer hackery. Thanks, tglx