Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll

Martin Karsten <mkarsten@xxxxxxxxxxxx> · Sun, 18 Aug 2024 10:51:04 -0400

On 2024-08-18 08:55, Willem de Bruijn wrote:
The value may not be obvious, but guidance (in the form of
documentation) can be provided.

Okay. Could you share a stab at what that would look like?

The timeout needs to be large enough that an application can get a
meaningful number of incoming requests processed without softirq
interference. At the same time, the timeout value determines the
worst-case delivery delay that a concurrent application using the same
queue(s) might experience. Please also see my response to Samiullah
quoted above. The specific circumstances and trade-offs might vary,
that's why a simple constant likely won't do.

Thanks. I really do mean this as an exercise of what documentation in
Documentation/networking/napi.rst will look like. That helps makes the
case that the interface is reasonably ease to use (even if only
targeting advanced users).

How does a user measure how much time a process will spend on
processing a meaningful number of incoming requests, for instance.
In practice, probably just a hunch?

As an example, we measure around 1M QPS in our experiments, fully
utilizing 8 cores and knowing that memcached is quite scalable. Thus we
can conclude a single request takes about 8 us processing time on
average. That has led us to a 20 us small timeout (gro_flush_timeout),
enough to make sure that a single request is likely not interfered with,
but otherwise as small as possible. If multiple requests arrive, the
system will quickly switch back to polling mode.

At the other end, we have picked a very large irq_suspend_timeout of
20,000 us to demonstrate that it does not negatively impact latency.
This would cover 2,500 requests, which is likely excessive, but was
chosen for demonstration purposes. One can easily measure the
distribution of epoll_wait batch sizes and batch sizes as low as 64 are
already very efficient, even in high-load situations.

Overall Ack on both your and Joe's responses.

epoll_wait disables the suspend if no events are found and ep_poll
would go to sleep. As the paper also hints, the timeout is only there
for misbehaving applications that stop calling epoll_wait, correct?
If so, then picking a value is not that critical, as long as not too
low to do meaningful work.

Correct.

Also see next paragraph.

Playing devil's advocate some more: given that ethtool usecs have to
be chosen with a similar trade-off between latency and efficiency,
could a multiplicative factor of this (or gro_flush_timeout, same
thing) be sufficient and easier to choose? The documentation does
state that the value chosen must be >= gro_flush_timeout.

I believe this would take away flexibility without gaining much. You'd
still want some sort of admin-controlled 'enable' flag, so you'd still
need some kind of parameter.

When using our scheme, the factor between gro_flush_timeout and
irq_suspend_timeout should *roughly* correspond to the maximum batch
size that an application would process in one go (orders of magnitude,
see above). This determines both the target application's worst-case
latency as well as the worst-case latency of concurrent applications, if
any, as mentioned previously.

Oh is concurrent applications the argument against a very high
timeout?

Only in the error case. If suspend_irq_timeout is large enough as you 
point out above, then as long as the target application behaves well, 
its batching settings are the determining factor.

Thanks,
Martin