Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-08-13 20:10, Jakub Kicinski wrote:
On Mon, 12 Aug 2024 17:46:42 -0400 Martin Karsten wrote:
Here's how it is intended to work:
    - An administrator sets the existing sysfs parameters for
      defer_hard_irqs and gro_flush_timeout to enable IRQ deferral.

    - An administrator sets the new sysfs parameter irq_suspend_timeout
      to a larger value than gro-timeout to enable IRQ suspension.

Can you expand more on what's the problem with the existing gro_flush_timeout?
Is it defer_hard_irqs_count? Or you want a separate timeout only for the
perfer_busy_poll case(why?)? Because looking at the first two patches,
you essentially replace all usages of gro_flush_timeout with a new variable
and I don't see how it helps.

gro-flush-timeout (in combination with defer-hard-irqs) is the default
irq deferral mechanism and as such, always active when configured. Its
static periodic softirq processing leads to a situation where:

- A long gro-flush-timeout causes high latencies when load is
sufficiently below capacity, or

- a short gro-flush-timeout causes overhead when softirq execution
asynchronously competes with application processing at high load.

The shortcomings of this are documented (to some extent) by our
experiments. See defer20 working well at low load, but having problems
at high load, while defer200 having higher latency at low load.

irq-suspend-timeout is only active when an application uses
prefer-busy-polling and in that case, produces a nice alternating
pattern of application processing and networking processing (similar to
what we describe in the paper). This then works well with both low and
high load.

What about NIC interrupt coalescing. defer_hard_irqs_count was supposed
to be used with NICs which either don't have IRQ coalescing or have a
broken implementation. The timeout of 200usec should be perfectly within
range of what NICs can support.

If the NIC IRQ coalescing works, instead of adding a new timeout value
we could add a new deferral control (replacing defer_hard_irqs_count)
which would always kick in after seeing prefer_busy_poll() but also
not kick in if the busy poll harvested 0 packets.
Maybe I am missing something, but I believe this would have the same problem that we describe for gro-timeout + defer-irq. When busy poll does not harvest packets and the application thread is idle and goes to sleep, it would then take up to 200 us to get the next interrupt. This considerably increases tail latencies under low load.

In order get low latencies under low load, the NIC timeout would have to be something like 20 us, but under high load the application thread will be busy for longer than 20 us and the interrupt (and softirq) will come too early and cause interference.

The fundamental problem is that one fixed timer cannot handle dynamic workloads, regardless of whether the timer is implemented in software or the NIC. However, the current software implementation of the timer makes it easy to add our mechanism that effectively switches between a short and a long timeout. I assume it would be more difficult/overhead to update the NIC timer all the time.

In other words, the complexity is always the same: A very long timeout is needed to suspend irqs during periods of successful busy polling and application processing. A short timeout is needed to receive the next packet(s) with low latency during idle periods.

It is tempting to think of the second timeout as 0 and in fact re-enable interrupts right away. We have tried it, but it leads to a lot of interrupts and corresponding inefficiencies, since a system below capacity frequently switches between busy and idle. Using a small timeout (20 us) for modest deferral and batching when idle is a lot more efficient.

Thanks,
Martin





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux