On 9/24/20 6:59 PM, Bjorn Helgaas wrote: > On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote: >> On 9/24/20 4:45 PM, Bjorn Helgaas wrote: >>> Possible subject: >>> >>> PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs >> Will switch to this. >> >>> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote: >>>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is >>>> passed on by the caller based on the housekeeping online CPUs (that are >>>> meant to perform managed IRQ jobs). >>>> >>>> A minimum of the max_vecs passed and housekeeping online CPUs is derived >>>> to ensure that we don't create excess vectors as that can be problematic >>>> specifically in an RT environment. In cases where the min_vecs exceeds the >>>> housekeeping online CPUs, max vecs is restricted based on the min_vecs >>>> instead. The proposed change is required because for an RT environment >>>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to >>>> keep the latency overhead to a minimum. If the number of housekeeping CPUs >>>> is significantly lower than that of the isolated CPUs we can run into >>>> failures while moving these IRQs to housekeeping CPUs due to per CPU >>>> vector limit. >>> Does this capture enough of the log? >>> >>> If we have isolated CPUs dedicated for use by real-time tasks, we >>> try to move IRQs to housekeeping CPUs to reduce overhead on the >>> isolated CPUs. >> How about: >> " >> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose >> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency >> overhead on these real-time CPUs. >> " >> >> What do you think? > It's OK, but from the PCI core perspective, "nohz_full mode" doesn't > really mean anything. I think it's a detail that should be inside the > "housekeeping CPU" abstraction. I get your point, in that case I will probably stick to your original suggestion just replacing the term "overhead" with "latency overhead". > >>> If we allocate too many IRQ vectors, moving them all to housekeeping >>> CPUs may exceed per-CPU vector limits. >>> >>> When we have isolated CPUs, limit the number of vectors allocated by >>> pci_alloc_irq_vectors() to the minimum number required by the >>> driver, or to one per housekeeping CPU if that is larger >> I think this is good, I can adopt this. >> >>> . >>> >>>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx> >>>> --- >>>> include/linux/pci.h | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >>>> >>>> diff --git a/include/linux/pci.h b/include/linux/pci.h >>>> index 835530605c0d..cf9ca9410213 100644 >>>> --- a/include/linux/pci.h >>>> +++ b/include/linux/pci.h >>>> @@ -38,6 +38,7 @@ >>>> #include <linux/interrupt.h> >>>> #include <linux/io.h> >>>> #include <linux/resource_ext.h> >>>> +#include <linux/sched/isolation.h> >>>> #include <uapi/linux/pci.h> >>>> >>>> #include <linux/pci_ids.h> >>>> @@ -1797,6 +1798,20 @@ static inline int >>>> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, >>>> unsigned int max_vecs, unsigned int flags) >>>> { >>>> + unsigned int hk_cpus = hk_num_online_cpus(); >>>> + >>>> + /* >>>> + * For a real-time environment, try to be conservative and at max only >>>> + * ask for the same number of vectors as there are housekeeping online >>>> + * CPUs. In case, the min_vecs requested exceeds the housekeeping >>>> + * online CPUs, restrict the max_vecs based on the min_vecs instead. >>>> + */ >>>> + if (hk_cpus != num_online_cpus()) { >>>> + if (min_vecs > hk_cpus) >>>> + max_vecs = min_vecs; >>>> + else >>>> + max_vecs = min_t(int, max_vecs, hk_cpus); >>>> + } >>> Is the below basically the same? >>> >>> /* >>> * If we have isolated CPUs for use by real-time tasks, >>> * minimize overhead on those CPUs by moving IRQs to the >>> * remaining "housekeeping" CPUs. Limit vector usage to keep >>> * housekeeping CPUs from running out of IRQ vectors. >>> */ >> How about the following as a comment: >> >> " >> If we have isolated CPUs or CPUs running in nohz_full mode for real-time, >> latency overhead is minimized on those CPUs by moving the IRQ vectors to >> the housekeeping CPUs. Limit the number of vectors to keep housekeeping >> CPUs from running out of IRQ vectors. >> >> " >> >>> if (housekeeping_cpus < num_online_cpus()) { >>> if (housekeeping_cpus < min_vecs) >>> max_vecs = min_vecs; >>> else if (housekeeping_cpus < max_vecs) >>> max_vecs = housekeeping_cpus; >>> } >> The only reason I went with hk_cpus instead of housekeeping_cpus is because >> at other places in the kernel I found a similar naming convention (eg. >> hk_mask, hk_flags etc.). >> But if housekeeping_cpus makes things more clear, I can switch to that >> instead. >> >> Although after Frederic and Peter's suggestion the previous call will change >> to something like: >> >> " >> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ); >> " >> >> Which should still falls in the that 80 chars per line limit. > I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as > long as "housekeeping" appears in the code somewhere. If we call > "housekeeping_num_online_cpus()" that should be enough. Got it, in that case we are good here. > >>> My comment isn't quite right because this patch only limits the number >>> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs. >> Yeap it doesn't move IRQs to the housekeeping CPUs. >> >>> I don't know where the move happens (or maybe you just avoid assigning >>> IRQs to isolated CPUs, and I don't know how that happens either). >> This can happen in the userspace, either manually or by some application >> such as tuned. > Some brief hint about this might be helpful. Sure, I will try to come up with something on the lines what you suggested i.e., it also includes the information that the IRQs are moved from the userspace. > >>>> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags, >>>> NULL); >>>> } >>>> -- >>>> 2.18.2 >>>> >> -- >> Thanks >> Nitesh >> > > -- Thanks Nitesh
Attachment:
signature.asc
Description: OpenPGP digital signature