On Sun, Nov 6, 2022 at 5:48 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > On Mon, Nov 07, 2022 at 06:01:44AM IST, Alexei Starovoitov wrote: > > On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > > > > > On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote: > > > > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > > > > > > > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is > > > > > excluded is_tracing_prog_type checks. This means that they can use maps > > > > > containing bpf_spin_lock, bpf_timer, etc. without verification failure. > > > > > > > > > > However, allowing fentry/fexit programs to use maps that have such > > > > > bpf_timer in the map value can lead to deadlock. > > > > > > > > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC > > > > > program executes and does bpf_map_update_elem on an array map that both > > > > > progs share. If the fentry programs calls bpf_map_update_elem for the > > > > > same key, it will lead to acquiring of the same lock from within the > > > > > critical section protecting the timer. > > > > > > > > > > The call chain is: > > > > > > > > > > bpf_prog_test_run_opts() // TC > > > > > bpf_prog_TC > > > > > bpf_map_update_elem(array_map, key=0) > > > > > bpf_obj_free_fields > > > > > bpf_timer_cancel_and_free > > > > > __bpf_spin_lock_irqsave > > > > > drop_prog_refcnt > > > > > bpf_prog_put > > > > > bpf_prog_FENTRY // FENTRY > > > > > bpf_map_update_elem(array_map, key=0) > > > > > bpf_obj_free_fields > > > > > bpf_timer_cancel_and_free > > > > > __bpf_spin_lock_irqsave // DEADLOCK > > > > > > > > > > BPF_TRACE_ITER attach type can be excluded because it always executes in > > > > > process context. > > > > > > > > > > Update selftests using bpf_timer in fentry to TC as they will be broken > > > > > by this change. > > > > > > > > which is an obvious red flag and the reason why we cannot do > > > > this change. > > > > This specific issue could be addressed with addition of > > > > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put. > > > > Other calls from __bpf_prog_put can stay as-is, > > > > since they won't be called in this convoluted case. > > > > I frankly don't get why you're spending time digging such > > > > odd corner cases that no one can hit in real use. > > > > > > I was trying to figure out whether bpf_list_head_free would be safe to call all > > > the time in map updates from bpf_obj_free_fields, since it takes the very same > > > spin lock that BPF program can also take to update the list. > > > > > > Map update ops are not allowed in the critical section, so this particular kind > > > of recurisve map update call should not be possible. perf event is already > > > prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update > > > the same map. > > > > > > But then I went looking whether it was a problem elsewhere... > > > > > > FWIW I have updated my patch to do: > > > > > > if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD > > > if (is_tracing_prog_type(prog_type) || ‣type: prog_type > > > (prog_type == BPF_PROG_TYPE_TRACING && > > > env->prog->expected_attach_type != BPF_TRACE_ITER)) { > > > verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp > > > return -EINVAL; > > > } > > > } > > > > That is a severe limitation. > > Why cannot you use notrace approach? > > Yes, notrace is indeed an option, but the problem is that everything within that > critical section needs to be notrace. bpf_list_head_free also ends up calling > bpf_obj_free_fields again (the level of recursion however won't exceed 3, since > we clamp list_head -> list_head till 2 levels). > > So the notrace needs to be applied to everything within it, which is not a > problem now. It can be done. let's do it then. > BPF_TIMER and BPF_KPTR_REF (the indirect call to > dtor) won't be triggered, so it probably just needs to be bpf_list_head_free > and bpf_obj_free_fields. > > But it can break silently in the future, if e.g. kptr is allowed. Same for > drop_prog_refcnt if something changes. Every change to anything they call (and > called by functions they call) needs to keep the restriction in mind. Only funcs that can realistically be called. Not every function that is in there. > I was wondering if in both cases of bpf_timer and bpf_list_head, we can simply > swap out the object locally while holding the lock, and then do everything > outside the spin lock. Maybe, but I wouldn't bother optimizing for convoluted cases. Use notrace when you can. Everything that bpf_mem_alloc is doing is notrace for the same reason. > > For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical > section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it > would mean swapping out the list_head and then draining it outside the lock. That also works. drop_prog_refcnt() can be moved after unlock. Don't see any race. > Then we hopefully don't need to use notrace, and it wouldn't be possible for any > tracing prog to execute while we hold the bpf_spin_lock (unless I missed > something). yep. spin_lock, link list, obj_new won't be allowed in is_tracing_prog_type(). Only talking about prog_type_tracing, since those are a lot more than tracing. Everything new falls into this category. We might even create an alias like prog_type_generic to highlight that.