On 9/22/20 5:54 AM, Frederic Weisbecker wrote: > On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote: >> On 9/21/20 6:58 PM, Frederic Weisbecker wrote: >>> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote: >>>> Nitesh Narayan Lal wrote: >>>> >>>>> In a realtime environment, it is essential to isolate unwanted IRQs from >>>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only >>>>> based on the online CPUs could lead to a potential issue on an RT setup >>>>> that has several isolated CPUs but a very few housekeeping CPUs. This is >>>>> because in these kinds of setups an attempt to move the IRQs to the >>>>> limited housekeeping CPUs from isolated CPUs might fail due to the per >>>>> CPU vector limit. This could eventually result in latency spikes because >>>>> of the IRQ threads that we fail to move from isolated CPUs. >>>>> >>>>> This patch prevents i40e to add vectors only based on available >>>>> housekeeping CPUs by using num_housekeeping_cpus(). >>>>> >>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx> >>>> The driver changes are straightforward, but this isn't the only driver >>>> with this issue, right? I'm sure ixgbe and ice both have this problem >>>> too, you should fix them as well, at a minimum, and probably other >>>> vendors drivers: >>>> >>>> $ rg -c --stats num_online_cpus drivers/net/ethernet >>>> ... >>>> 50 files contained matches >>> Ouch, I was indeed surprised that these MSI vector allocations were done >>> at the driver level and not at some $SUBSYSTEM level. >>> >>> The logic is already there in the driver so I wouldn't oppose to this very patch >>> but would a shared infrastructure make sense for this? Something that would >>> also handle hotplug operations? >>> >>> Does it possibly go even beyond networking drivers? >> From a generic solution perspective, I think it makes sense to come up with a >> shared infrastructure. >> Something that can be consumed by all the drivers and maybe hotplug operations >> as well (I will have to further explore the hotplug part). > That would be great. I'm completely clueless about those MSI things and the > actual needs of those drivers. Now it seems to me that if several CPUs become > offline, or as is planned in the future, CPU isolation gets enabled/disabled > through cpuset, then the vectors may need some reorganization. +1 > > But I don't also want to push toward a complicated solution to handle CPU hotplug > if there is no actual problem to solve there. Sure, even I am not particularly sure about the hotplug scenarios. > So I let you guys judge. > >> However, there are RT workloads that are getting affected because of this >> issue, so does it make sense to go ahead with this per-driver basis approach >> for now? > Yep that sounds good. Thank you for confirming. > >> Since a generic solution will require a fair amount of testing and >> understanding of different drivers. Having said that, I can definetly start >> looking in that direction. > Thanks a lot! > -- Nitesh
Attachment:
signature.asc
Description: OpenPGP digital signature