On Thu, Dec 29, 2022 at 6:13 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > > > On December 26, 2022 10:35:49 PM Stanislav Fomichev <stfomichev@xxxxxxxxx> > > wrote: > > >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > >> get_func_ip() */ > > >>>> - tstamp_type_access:1; /* Accessed > > >>>> __sk_buff->tstamp_type */ > > >>>> + tstamp_type_access:1, /* Accessed > > >>>> __sk_buff->tstamp_type */ > > >>>> + valid_id:1; /* Is bpf_prog::aux::__id valid? */ > > >>>> enum bpf_prog_type type; /* Type of BPF program */ > > >>>> enum bpf_attach_type expected_attach_type; /* For some prog types */ > > >>>> u32 len; /* Number of filter blocks */ > > >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog); > > >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog); > > >>>> void bpf_prog_put(struct bpf_prog *prog); > > >>>> > > >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog) > > >>>> +{ > > >>>> + if (WARN(!prog->valid_id, "Attempting to use an invalid eBPF program")) > > >>>> + return 0; > > >>>> + return prog->aux->__id; > > >>>> +} > > >>> > > >>> I'm still missing why we need to have this WARN and have a check at all. > > >>> IIUC, we're actually too eager in resetting the id to 0, and need to > > >>> keep that stale id around at least for perf/audit. > > >>> Why not have a flag only to protect against double-idr_remove > > >>> bpf_prog_free_id and keep the rest as is? > > >>> Which places are we concerned about that used to report id=0 but now > > >>> would report stale id? > > >> > > >> What double-idr_remove are you concerned about? > > >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero > > >> while __bpf_prog_put just dropped it to zero. > > > > > > (traveling, sending from an untested setup, hope it reaches everyone) > > > > > > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy which > > > tries to make offloaded program disappear from the idr when the netdev > > > goes offline. So I'm assuming that '!prog->aux->id' check in bpf_prog_free_id > > > is to handle that case where we do bpf_prog_free_id much earlier than the > > > rest of the __bpf_prog_put stuff. > > > > > >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred() > > >> after perf_event_bpf_event and bpf_audit_prog ? > > >> Probably can remove the obsolete do_idr_lock bool flag as > > >> separate patch? > > > > > > +1 on removing do_idr_lock separately. > > > > > >> Much simpler fix and no code churn. > > >> Both valid_id and saved_id approaches have flaws. > > > > > > Given the __bpf_prog_offload_destroy path above, we still probably need > > > some flag to indicate that the id has been already removed from the idr? > > > > So what do you guys want in a patch? Is there a consensus on what you > > would merge to fix this bug/regression? > > Can we try the following? > > 1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from > kernel/bpf/offload.c; that should make it easier to reason about those > '!id' checks calls? you mean a single call, right? > 2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after > audit/perf in kernel/bpf/syscall.c (there are comments that say "must > be called first", but I don't see why; seems like GET_FD_BY_ID would > correctly return -ENOENT; maybe Martin can chime in, CC'ed him > explicitly) The comment says that it should be removed from idr before __bpf_prog_put_noref will proceed to clean up. > 3. (optionally) Remove do_idr_lock arguments (all callers are passing 'true') yes. please.