On 5/22/24 16:03, Alexei Starovoitov wrote:
On Tue, May 21, 2024 at 2:59 PM Barret Rhoden <brho@xxxxxxxxxx> wrote:
hi -
we've noticed some variability in bpf timer expiration that goes away if
we change the timers to run in hardirq context.
What kind of variability are we talking about?
hmm - it's actually worse than just variability. the issue is that
we're using the timer to implement scheduling policy. yet the timer
sometimes gets handled by ksoftirqd. and ksoftirqd relies on the
scheduling policy to run. we end up with a circular dependence.
e.g. say we want to let a very high priority thread run for 50us.
ideally we'd just set a timer for 50us and force a context switch when
it goes off.
but if timers might require ksoftirqd to run, we'll have to treat that
ksoftirqd specially (always run ksoftirqd if it is runnable), and then
we won't be able to let the high prio thread run ahead of other, less
important softirqs.
i imagine the use of softirqs was to keep the potentially long-running
timer callback out of hardirq, but is there anything particularly
dangerous about making them run in hardirq?
exactly what you said. We don't have a good mechanism to
keep bpf prog runtime tiny enough for hardirq.
i think stuff like the scheduler tick, and any bpf progs that run there
are also run in hardirq. let alone tracing progs. so maybe if we've
already opened the gates to hardirq progs, then maybe letting timers run
there too would be ok? perhaps with CAP_BPF.
barret