Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/4/21 4:13 PM, Alex Belits wrote:
> On 3/4/21 10:15, Nitesh Narayan Lal wrote:
>> External Email
>>
>> ----------------------------------------------------------------------
>>
>> On 2/11/21 10:55 AM, Nitesh Narayan Lal wrote:
>>> On 2/6/21 7:43 PM, Nitesh Narayan Lal wrote:
>>>> On 2/5/21 5:23 PM, Thomas Gleixner wrote:
>>>>> On Thu, Feb 04 2021 at 14:17, Nitesh Narayan Lal wrote:
>>>>>> On 2/4/21 2:06 PM, Marcelo Tosatti wrote:
>>>>>>>>> How about adding a new flag for isolcpus instead?
>>>>>>>>>
>>>>>>>> Do you mean a flag based on which we can switch the affinity mask to
>>>>>>>> housekeeping for all the devices at the time of IRQ distribution?
>>>>>>> Yes a new flag for isolcpus. HK_FLAG_IRQ_SPREAD or some better name.
>>>>>> Does sounds like a nice idea to explore, lets see what Thomas thinks
>>>>>> about it.
>>> <snip>
>>>
>>>>>> When the affinity mask of the interrupt at the time when it is actually
>>>>>> requested contains an isolated CPU then nothing prevents the kernel from
>>>>>> steering it at an isolated CPU. But that has absolutely nothing to do
>>>>>> with that spreading thingy.
>>>>>>
>>>>>> The only difference which this change makes is the fact that the
>>>>>> affinity hint changes. Nothing else.
>>>>>>
>>>> Thanks for the detailed explanation.
>>>>
>>>> Before I posted this patch, I was doing some debugging on a setup where I
>>>> was observing some latency issues due to the iavf IRQs that were pinned on
>>>> the isolated CPUs.
>>>>
>>>> Based on some initial traces I had this impression that the affinity hint
>>>> or cpumask_local_spread was somehow playing a role in deciding the affinity
>>>> mask of these IRQs. Although, that does look incorrect after going through
>>>> your explanation.
>>>> For some reason, with a kernel that had this patch when I tried creating
>>>> VFs iavf IRQs always ended up on the HK CPUs.
>>>>
>>>> The reasoning for the above is still not very clear to me. I will
>>>> investigate
>>>> this further to properly understand this behavior.
>>>>
>>>>
>>> After a little more digging, I found out why cpumask_local_spread change
>>> affects the general/initial smp_affinity for certain device IRQs.
>>>
>>> After the introduction of the commit:
>>>
>>>      e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
>>>
>>
>> Continuing the conversation about the above commit and adding Jesse.
>> I was trying to understand the problem that the commit message explains
>> "The default behavior of the kernel is somewhat undesirable as all
>> requested interrupts end up on CPU0 after registration.", I have also been
>> trying to reproduce this behavior without the patch but I failed in doing
>> so, maybe because I am missing something here.
>>
>> @Jesse Can you please explain? FWIU IRQ affinity should be decided based on
>> the default affinity mask.
>>
>> The problem with the commit is that when we overwrite the affinity mask
>> based on the hinting mask we completely ignore the default SMP affinity
>> mask. If we do want to overwrite the affinity based on the hint mask we
>> should atleast consider the default SMP affinity.
>>
>
> cpumask_local_spread() is used by a small number of drivers, mostly for
> Ethernet and cryptographic devices, however it includes Cavium and Marvell
> devices that were included in every piece of hardware that I and Yury Norov
> worked on. Without my patch (or previous, later replaced, Yury's patch that
> was developed before there were housekeeping CPUs), driver would completely
> break any attempts to configure task isolation, because it would distribute
> processing over CPUs regardless of any masks related to isolation (and later
> housekeeping). This is why it was created, and it just happens that it also
> makes sense for CPU isolation in general. Of course, none of it would be
> experienced on hardware that does not include those devices, possibly
> creating some wrong impression about its effect and purpose.
>
> It may be that my patch can be criticized for not accommodating CPU hotplug
> and other runtime changes of masks. Or drivers can be criticized for their
> behavior that relies on calling cpumask_local_spread() once on
> initialization and then assuming that all CPUs are configured forever.
> However as far as I can tell, currently we have no other means of
> controlling the behavior of drivers that manage their own interrupt or
> thread to CPU mapping, and no way to communicate any of those changes to
> them while they are running. Drivers may have legitimate reasons for
> maintaining permanent or semi-permanent CPU core to interrupt mapping,
> especially on devices with very large numbers of CPU cores and built-in
> support for parallel processing of network packets.
>
> If we want it to be done in some manner that accommodates current demands,
> we should simply replace cpumask_local_spread() with something else, or,
> maybe, add some means that will allow dynamic changes. Thankfully, there are
> very few (IIRC, 19) places where cpumask_local_spread() is used, so it may
> be accommodated with relatively small amount of code to write and test. Then
> everything else will be able to switch to the same mechanism whenever
> necessary.
>

So there are two different issues, the first issue is how the mask
retrieved based on the cpumask_local_spread is used to set IRQ affinity.
Ideally when a device is initialized its IRQs are distributed based on the
default SMP affinity mask (considering irqbalance is disabled). However, it
is not the case right now as some drivers that set their hint affinity
using cpumask_local_spread overwrites the previously set affinity mask for
the IRQs. So even if you configure the default_smp_affinity from the
userspace it will not affect these device IRQs. This is precisely why your
fix for cpumask_local_spread helped in improving the isolation.

The second issue that you brought up is to balance the IRQ-specific load
between CPUs efficiently. FWIU if you have irqbalance enabled it should
already be doing that based on the policy that you define in the userspace.
Is that not the case or maybe I am missing something?

-- 
Thanks
Nitesh




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux