Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sun, 6 Nov 2022 16:31:44 -0800

On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
>
> On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote:
> > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
> > >
> > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
> > > excluded is_tracing_prog_type checks. This means that they can use maps
> > > containing bpf_spin_lock, bpf_timer, etc. without verification failure.
> > >
> > > However, allowing fentry/fexit programs to use maps that have such
> > > bpf_timer in the map value can lead to deadlock.
> > >
> > > Suppose that an fentry program is attached to bpf_prog_put, and a TC
> > > program executes and does bpf_map_update_elem on an array map that both
> > > progs share. If the fentry programs calls bpf_map_update_elem for the
> > > same key, it will lead to acquiring of the same lock from within the
> > > critical section protecting the timer.
> > >
> > > The call chain is:
> > >
> > > bpf_prog_test_run_opts() // TC
> > >   bpf_prog_TC
> > >     bpf_map_update_elem(array_map, key=0)
> > >       bpf_obj_free_fields
> > >         bpf_timer_cancel_and_free
> > >           __bpf_spin_lock_irqsave
> > >             drop_prog_refcnt
> > >               bpf_prog_put
> > >                 bpf_prog_FENTRY // FENTRY
> > >                   bpf_map_update_elem(array_map, key=0)
> > >                     bpf_obj_free_fields
> > >                       bpf_timer_cancel_and_free
> > >                         __bpf_spin_lock_irqsave // DEADLOCK
> > >
> > > BPF_TRACE_ITER attach type can be excluded because it always executes in
> > > process context.
> > >
> > > Update selftests using bpf_timer in fentry to TC as they will be broken
> > > by this change.
> >
> > which is an obvious red flag and the reason why we cannot do
> > this change.
> > This specific issue could be addressed with addition of
> > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put.
> > Other calls from __bpf_prog_put can stay as-is,
> > since they won't be called in this convoluted case.
> > I frankly don't get why you're spending time digging such
> > odd corner cases that no one can hit in real use.
>
> I was trying to figure out whether bpf_list_head_free would be safe to call all
> the time in map updates from bpf_obj_free_fields, since it takes the very same
> spin lock that BPF program can also take to update the list.
>
> Map update ops are not allowed in the critical section, so this particular kind
> of recurisve map update call should not be possible. perf event is already
> prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update
> the same map.
>
> But then I went looking whether it was a problem elsewhere...
>
> FWIW I have updated my patch to do:
>
>   if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD
>         if (is_tracing_prog_type(prog_type) || ‣type: prog_type
>             (prog_type == BPF_PROG_TYPE_TRACING &&
>              env->prog->expected_attach_type != BPF_TRACE_ITER)) {
>                 verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp
>                 return -EINVAL;
>         }
>   }

That is a severe limitation.
Why cannot you use notrace approach?