On 2024-08-19 22:36, Jakub Kicinski wrote:
On Sun, 18 Aug 2024 10:51:04 -0400 Martin Karsten wrote:
I believe this would take away flexibility without gaining much. You'd
still want some sort of admin-controlled 'enable' flag, so you'd still
need some kind of parameter.
When using our scheme, the factor between gro_flush_timeout and
irq_suspend_timeout should *roughly* correspond to the maximum batch
size that an application would process in one go (orders of magnitude,
see above). This determines both the target application's worst-case
latency as well as the worst-case latency of concurrent applications, if
any, as mentioned previously.
Oh is concurrent applications the argument against a very high
timeout?
Only in the error case. If suspend_irq_timeout is large enough as you
point out above, then as long as the target application behaves well,
its batching settings are the determining factor.
Since the discussion is still sort of going on let me ask something
potentially stupid (I haven't read the paper, yet). Are the cores
assumed to be fully isolated (ergo the application can only yield
to the idle thread)? Do we not have to worry about the scheduler
deciding to schedule the process out involuntarily?
That shouldn't be a problem. If the next thread(s) can make progress,
nothing is lost. If the next thread(s) cannot make progress, for example
waiting for network I/O, they will block and the target application
thread will run again. If another thread is busy-looping on network I/O,
I would argue that having multiple busy-looping threads competing for
the same core is probably not a good idea anyway.
Thanks,
Martin