Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 30 Jun 2021 13:08:08 +0300

On Tue, Jun 29, 2021 at 4:28 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Jun 28, 2021 at 11:34 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > Have you considered alternatively to implement something like
> > bpf_ringbuf_query() for BPF ringbuf that will allow to query various
> > things about the timer (e.g., whether it is active or not, and, of
> > course, remaining expiry time). That will be more general, easier to
> > extend, and will cover this use case:
> >
> > long exp = bpf_timer_query(&t->timer, BPF_TIMER_EXPIRY);
> > bpf_timer_start(&t->timer, new_callback, exp);
>
> yes, but...
> hrtimer_get_remaining + timer_start to that value is racy
> and not accurate.

yes, but even though we specify expiration in nanosecond precision, no
one should expect that precision w.r.t. when callback is actually
fired. So fetching current expiration, adding new one, and re-setting
it shouldn't be a problem in practice, IMO.

I just think the most common case is to set a timer once, so ideally
usability is optimized for that (so taken to extreme it would be just
bpf_timer_start without any bpf_timer_init, but we've already
discussed this, no need to do that again here). Needing bpf_timer_init
+ bpf_timer_set_callbcack + bpf_timer_start for a common case feels
suboptimal usability-wise.

There is also a new race with bpf_timer_set_callback +
bpf_timer_start. Callback can fire inbetween those two operations, so
we could get new callback at old expiration or old callback with new
expiration. To do full update reliably, you'd need to explicitly
bpf_timer_cancel() first, at which point separate
bpf_timer_set_callback() doesn't help at all.

> hrtimer_get_expires_ns + timer_start(MODE_ABS)
> would be accurate, but that's an unnecessary complication.
> To live replace old bpf prog with new one
> bpf_for_each_map_elem() { bpf_timer_set_callback(new_prog); }
> is much faster, since timers don't need to be dequeue, enqueue.
> No need to worry about hrtimer machinery internal changes, etc.
> bpf prog being replaced shouldn't be affecting the rest of the system.

That's a good property, but if it was done as a
bpf_timer_set_callback() in addition to current
bpf_timer_start(callback_fn) it would still allow to have a simple
typical use.

Another usability consideration. With mandatory
bpf_timer_set_callback(), bpf_timer_start() will need to return some
error code if the callback wasn't set yet, right? I'm afraid that in
practice it will be the situation similar to bpf_trace_printk() where
people expect that it always succeeds and will never check the return
code. It's obviously debuggable, but a friction point nevertheless.

>
> > This will keep common timer scenarios to just two steps, init + start,
> > but won't prevent more complicated ones. Things like extending
> > expiration by one second relative that what was remaining will be
> > possible as well.
>
> Extending expiration would be more accurate with hrtimer_forward_now().
>
> All of the above points are minor compared to the verifier advantage.
> bpf_timer_set_callback() typically won't be called from the callback.
> So verifier's insn_procssed will be drastically lower.
> The combinatorial explosion of states even for this small
> selftests/bpf/progs/timer.c is significant.
> With bpf_timer_set_callback() is done outside of callback the verifier
> behavior will be predictable.
> To some degree patches 4-6 could have been delayed, but since the
> the algo is understood and it's working, I'm going to keep them.
> It's nice to have that flexibility, but the less pressure on the
> verifier the better.

I haven't had time to understand those new patches yet, sorry, so not
sure where the state explosion is coming from. I'll get to it for real
next week. But improving verifier internals can be done transparently,
while changing/fixing BPF UAPI is much harder and more disruptive.