Re: PCI, isolcpus, and irq affinity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/12/20 12:58 PM, Bjorn Helgaas wrote:
> [+cc Christoph, Thomas, Nitesh]
>
> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>> I've got a linux system running the RT kernel with threaded irqs.  On
>> startup we affine the various irq threads to the housekeeping CPUs, but I
>> recently hit a scenario where after some days of uptime we ended up with a
>> number of NVME irq threads affined to application cores instead (not good
>> when we're trying to run low-latency applications).
> pci_alloc_irq_vectors_affinity() basically just passes affinity
> information through to kernel/irq/affinity.c, and the PCI core doesn't
> change affinity after that.
>
>> Looking at the code, it appears that the NVME driver can in some scenarios
>> end up calling pci_alloc_irq_vectors_affinity() after initial system
>> startup, which seems to determine CPU affinity without any regard for things
>> like "isolcpus" or "cset shield".
>>
>> There seem to be other reports of similar issues:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
>>
>> It looks like some SCSI drivers and virtio_pci_common.c will also call
>> pci_alloc_irq_vectors_affinity(), though I'm not sure if they would ever do
>> it after system startup.
>>
>> How does it make sense for the PCI subsystem to affine interrupts to CPUs
>> which have explicitly been designated as "isolated"?
> This recent thread may be useful:
>
>   https://lore.kernel.org/linux-pci/20200928183529.471328-1-nitesh@xxxxxxxxxx/
>
> It contains a patch to "Limit pci_alloc_irq_vectors() to housekeeping
> CPUs".  I'm not sure that patch summary is 100% accurate because IIUC
> that particular patch only reduces the *number* of vectors allocated
> and does not actually *limit* them to housekeeping CPUs.

That is correct the above-mentioned patch is just to reduce the number of
vectors.

Based on the problem that has been described here, I think the issue could
be the usage of cpu_online_mask/cpu_possible_mask while creating the
affinity mask or while distributing the jobs. What we should be doing in
these cases is to basically use the housekeeping_cpumask instead.

A few months back similar issue has been fixed for cpumask_local_spread
and some other sub-systems [1].

[1] https://lore.kernel.org/lkml/20200625223443.2684-1-nitesh@xxxxxxxxxx/

-- 
Nitesh


Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux