On 10/12/2020 1:07 PM, Keith Busch wrote:
On Mon, Oct 12, 2020 at 12:58:41PM -0600, Chris Friesen wrote:
On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
I've got a linux system running the RT kernel with threaded irqs. On
startup we affine the various irq threads to the housekeeping CPUs, but I
recently hit a scenario where after some days of uptime we ended up with a
number of NVME irq threads affined to application cores instead (not good
when we're trying to run low-latency applications).
These threads and the associated interupt vectors are completely
harmless and fully idle as long as there is nothing on those isolated
CPUs which does disk I/O.
Some of the irq threads are affined (by the kernel presumably) to multiple
CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a couple of other
queues were affined 0x1c00001c0).
That means you have more CPUs than your controller has queues. When that
happens, some sharing of the queue resources among CPUs is required.
Is it required that every CPU is part of the mask for at least one queue?
If we can preferentially route interrupts to the housekeeping CPUs (for
queues with multiple CPUs in the mask), how is that different than just
affining all the queues to the housekeeping CPUs and leaving the
isolated CPUs out of the mask entirely?
Thanks,
Chris