Re: This is the fourth time I've tried to find what led to the regression of outgoing network speed and each time I find the merge commit 8c94ccc7cd691472461448f98e2372c75849406c

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Mon, 26 Feb 2024 19:09:22 +0100

On Mon, Feb 26 2024 at 12:54, Mathias Nyman wrote:
> On 26.2.2024 11.51, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> I don't think reverting this series is a solution.
>>>
>>> This isn't really about those usb xhci patches.
>>> This is about which interrupt gets assigned to which CPU.
>> 
>> I know, but from my understanding of Linus expectations wrt to handling
>> regressions it does not matter much if a bug existed earlier or
>> somewhere else: what counts is the commit that exposed the problem.
>> 
>> But I might be wrong here. Anyway, not CCing Linus for this; but I'll
>> likely point him to this direction on Sunday in my next weekly report,
>> unless some fix comes into sight.
>> 
>>> Mikhail got unlucky when the network adapter interrupts on that system was
>>> assigned to CPU0, clearly a more "clogged" CPU, thus causing a drop in max
>>> bandwidth.
>> 
>> But maybe others will be just as "unlucky". Or is there anything to
>> believe otherwise? Maybe some aspect of the .config or local setup that
>> is most likely unique to Mikhail's setup?
>
> I believe this is a zero-sum case.
>
> Others got equally lucky due to this change.
> Their devices end up interrupting less clogged CPUs and see a similar
> performance increase.

Reverting this does not make any sense.

The kernel assigns the initial interrupt affinities to the CPUs so that
the number of interrupts is halfways balanced. That spreading algorithm
is completely agnostic of the actual usage of the interrupts. Where
e.g. the network interrupt ends up depends on the probe/enumeration
order of devices. Add another PCI-E card into the machine and it will
again look different.

There is nothing the kernel can do about it and earlier attempts to do
interrupt frequency based balancing in the kernel ended up nowhere
simply because the kernel does not have enough information about the
overall requirements. That's why the kernel leaves the affinity
configuration for user space, e.g. irqbalanced, except for true
multi-queue scenarios like NVME where the kernel binds queues and their
interrupts to specific CPUs or groups of CPUs.

Why ending up on CPU0 has this particular effect on Mikhails machine is
unclear as we don't have any information about the overall workload,
other interrupt sources on CPU0 and their frequency. That'd need to be
investigated with instrumentation and might unearth some completely
different underlying reason causing this behavior.

So I don't think this is a regression in the true sense of
regressions. It's an unfortunate coincidence and reverting the
identified commits would just paper over the real problem, if there is
actually one single source of trouble which causes the performance drop
only on CPU0.  The commits are definitely _not_ the root cause, they
happen to unearth some other issue, which might be as mundane as
e.g. that the NVME interrupt on CPU0 is competing with the network
interrupt. So don't shoot the messenger.

Thanks,

        tglx