On Mon, Jun 14, 2021 at 10:31 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Mon, Jun 14, 2021 at 8:29 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, Jun 14, 2021 at 9:51 AM Yonghong Song <yhs@xxxxxx> wrote: > > > > + ret = BPF_CAST_CALL(t->callback_fn)((u64)(long)map, > > > > + (u64)(long)key, > > > > + (u64)(long)t->value, 0, 0); > > > > + WARN_ON(ret != 0); /* Next patch disallows 1 in the verifier */ > > > > > > I didn't find that next patch disallows callback return value 1 in the > > > verifier. If we indeed disallows return value 1 in the verifier. We > > > don't need WARN_ON here. Did I miss anything? > > > > Ohh. I forgot to address this bit in the verifier. Will fix. > > > > > > + if (!hrtimer_active(&t->timer) || hrtimer_callback_running(&t->timer)) > > > > + /* If the timer wasn't active or callback already executing > > > > + * bump the prog refcnt to keep it alive until > > > > + * callback is invoked (again). > > > > + */ > > > > + bpf_prog_inc(t->prog); > > > > > > I am not 100% sure. But could we have race condition here? > > > cpu 1: running bpf_timer_start() helper call > > > cpu 2: doing hrtimer work (calling callback etc.) > > > > > > Is it possible that > > > !hrtimer_active(&t->timer) || hrtimer_callback_running(&t->timer) > > > may be true and then right before bpf_prog_inc(t->prog), it becomes > > > true? If hrtimer_callback_running() is called, it is possible that > > > callback function could have dropped the reference count for t->prog, > > > so we could already go into the body of the function > > > __bpf_prog_put()? > > > > you're correct. Indeed there is a race. > > Circular dependency is a never ending headache. > > That's the same design mistake as with tail_calls. > > It felt that this case would be simpler than tail_calls and a bpf program > > pinning itself with bpf_prog_inc can be made to work... nope. > > I'll get rid of this and switch to something 'obviously correct'. > > Probably a link list with a lock to keep a set of init-ed timers and > > auto-cancel them on prog refcnt going to zero. > > To do 'bpf daemon' the prog would need to be pinned. > > Hm.. wouldn't this eliminate that race: > > switch (hrtimer_try_to_cancel(&t->timer)) > { > case 0: > /* nothing was queued */ > bpf_prog_inc(t->prog); > break; > case 1: > /* already have refcnt and it won't be bpf_prog_put by callback */ > break; > case -1: > /* callback is running and will bpf_prog_put, so we need to take > another refcnt */ > bpf_prog_inc(t->prog); > break; > } > hrtimer_start(&t->timer, ns_to_ktime(nsecs), HRTIMER_MODE_REL_SOFT); > > So instead of guessing (racily) whether there is a queued callback or > not, try to cancel just in case there is. Then rely on the nice > guarantees that hrtimer cancellation API provides. I haven't thought it through yet, but the above approach could indeed solve this particular race. Unfortunately there are other races. There is an issue with bpf_timer_init. Since it doesn't take refcnt another program might do lookup and bpf_timer_start while the first prog got to refcnt=0 and got freed. Adding refcnt to bpf_timer_init() makes the prog self pinned and no callback might ever be executed (if there were no bpf_timer_start), so that will cause a high chance of bpf prog stuck in the kernel. There could be ref+uref schemes similar to tail_calls to address all that, but it gets ugly quickly. imo all these issues and races is a sign that such self pinning shouldn't be allowed.