On 12 Oct 2020, at 9:58, Bjorn Helgaas wrote:
[+cc Christoph, Thomas, Nitesh]
On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
I've got a linux system running the RT kernel with threaded irqs.
On
startup we affine the various irq threads to the housekeeping CPUs,
but I
recently hit a scenario where after some days of uptime we ended up
with a
number of NVME irq threads affined to application cores instead (not
good
when we're trying to run low-latency applications).
pci_alloc_irq_vectors_affinity() basically just passes affinity
information through to kernel/irq/affinity.c, and the PCI core doesn't
change affinity after that.
Looking at the code, it appears that the NVME driver can in some
scenarios
end up calling pci_alloc_irq_vectors_affinity() after initial system
startup, which seems to determine CPU affinity without any regard for
things
like "isolcpus" or "cset shield".
There seem to be other reports of similar issues:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
It looks like some SCSI drivers and virtio_pci_common.c will also
call
pci_alloc_irq_vectors_affinity(), though I'm not sure if they would
ever do
it after system startup.
How does it make sense for the PCI subsystem to affine interrupts to
CPUs
which have explicitly been designated as "isolated"?
This recent thread may be useful:
https://lore.kernel.org/linux-pci/20200928183529.471328-1-nitesh@xxxxxxxxxx/
It contains a patch to "Limit pci_alloc_irq_vectors() to housekeeping
CPUs". I'm not sure that patch summary is 100% accurate because IIUC
that particular patch only reduces the *number* of vectors allocated
and does not actually *limit* them to housekeeping CPUs.
Bjorn
Chris,
Are you attempting a tick-less run? I’ve seen the NO_HZ_FULL (full
dynticks) feature behave somewhat inconsistently when PREEMPT_RT is
enabled. The timer ticks suppression feature can at times appear to be
not functioning. I’m curious about how you are attempting to isolate
the cores.
Thanks,
Sean