On Mon, Jul 10, 2023 at 2:14 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > On Mon, Jul 10, 2023 at 1:16 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, Jul 10, 2023 at 12:00 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > > > On Mon, Jul 10, 2023 at 11:27 AM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > On Mon, Jul 10, 2023 at 11:18 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > > > > > > > On 07/10, Daniel Borkmann wrote: > > > > > > On 7/7/23 11:27 PM, Stanislav Fomichev wrote: > > > > > > > On 07/07, Daniel Borkmann wrote: > > > > > > [...] > > > > > > > > +static inline struct bpf_mprog_entry * > > > > > > > > +bpf_mprog_create(const size_t size, const off_t off) > > > > > > > > +{ > > > > > > > > + struct bpf_mprog_bundle *bundle; > > > > > > > > + void *ptr; > > > > > > > > + > > > > > > > > + BUILD_BUG_ON(size < sizeof(*bundle) + off); > > > > > > > > + BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64)); > > > > > > > > + BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) != > > > > > > > > + ARRAY_SIZE(bundle->cp_items)); > > > > > > > > + > > > > > > > > + ptr = kzalloc(size, GFP_KERNEL); > > > > > > > > + if (ptr) { > > > > > > > > + bundle = ptr + off; > > > > > > > > + atomic64_set(&bundle->revision, 1); > > > > > > > > + bundle->off = off; > > > > > > > > + bundle->a.parent = bundle; > > > > > > > > + bundle->b.parent = bundle; > > > > > > > > + return &bundle->a; > > > > > > > > + } > > > > > > > > + return NULL; > > > > > > > > +} > > > > > > > > + > > > > > > > > +void bpf_mprog_free_rcu(struct rcu_head *rcu); > > > > > > > > + > > > > > > > > +static inline void bpf_mprog_free(struct bpf_mprog_entry *entry) > > > > > > > > +{ > > > > > > > > + struct bpf_mprog_bundle *bundle = entry->parent; > > > > > > > > + > > > > > > > > + call_rcu(&bundle->rcu, bpf_mprog_free_rcu); > > > > > > > > +} > > > > > > > > > > > > > > Any reason we're doing allocation here? Why not do > > > > > > > bpf_mprog_init(struct bpf_mprog_bundle *) instead that simply initializes > > > > > > > the fields? Then we can move allocation/free part to the caller (tcx) along > > > > > > > with rcu_head. > > > > > > > Feels like it would be a bit more conventional/readable? bpf_mprog_free{,_rcu} > > > > > > > will also become tcx_free{,_rcu}.. > > > > > > > > > > > > > > I guess current approach works, but it took me awhile to figure it out.. > > > > > > > (maybe it's just me) > > > > > > > > > > > > I found this approach quite useful for tcx case since we only fetch the > > > > > > bpf_mprog_entry for tcx_link_prog_attach et al, but I can take a look to > > > > > > see if this looks better and if it does I'll include it. > > > > > > > > > > > > > > +static inline void bpf_mprog_mark_ref(struct bpf_mprog_entry *entry, > > > > > > > > + struct bpf_tuple *tuple) > > > > > > > > +{ > > > > > > > > + WARN_ON_ONCE(entry->parent->ref); > > > > > > > > + if (!tuple->link) > > > > > > > > + entry->parent->ref = tuple->prog; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry) > > > > > > > > +{ > > > > > > > > + entry->parent->count++; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry) > > > > > > > > +{ > > > > > > > > + entry->parent->count--; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline int bpf_mprog_max(void) > > > > > > > > +{ > > > > > > > > + return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry) > > > > > > > > +{ > > > > > > > > + int total = entry->parent->count; > > > > > > > > + > > > > > > > > + WARN_ON_ONCE(total > bpf_mprog_max()); > > > > > > > > + return total; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry, > > > > > > > > + struct bpf_prog *prog) > > > > > > > > +{ > > > > > > > > + const struct bpf_mprog_fp *fp; > > > > > > > > + const struct bpf_prog *tmp; > > > > > > > > + > > > > > > > > + bpf_mprog_foreach_prog(entry, fp, tmp) { > > > > > > > > + if (tmp == prog) > > > > > > > > + return true; > > > > > > > > + } > > > > > > > > + return false; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline bool bpf_mprog_swap_entries(const int code) > > > > > > > > +{ > > > > > > > > + return code == BPF_MPROG_SWAP || > > > > > > > > + code == BPF_MPROG_FREE; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry) > > > > > > > > +{ > > > > > > > > + atomic64_inc(&entry->parent->revision); > > > > > > > > + synchronize_rcu(); > > > > > > > > > > > > > > Maybe add a comment on why we need to synchronize_rcu here? In general, > > > > > > > I don't think I have a good grasp of that ->ref member. > > > > > > > > > > > > Yeap, will add a comment. For the case where we delete the prog, we mark > > > > > > it in bpf_mprog_detach, but we can only drop the reference once the user > > > > > > swapped the bpf_mprog_entry and ensured that there are no in-flight users > > > > > > hence both in bpf_mprog_commit. > > > > > > > > > > > > [...] > > > > > > > > +static int bpf_mprog_prog(struct bpf_tuple *tuple, > > > > > > > > + u32 object, u32 flags, > > > > > > > > + enum bpf_prog_type type) > > > > > > > > +{ > > > > > > > > + bool id = flags & BPF_F_ID; > > > > > > > > + struct bpf_prog *prog; > > > > > > > > + > > > > > > > > + if (id) > > > > > > > > + prog = bpf_prog_by_id(object); > > > > > > > > + else > > > > > > > > + prog = bpf_prog_get(object); > > > > > > > > + if (IS_ERR(prog)) { > > > > > > > > > > > > > > [..] > > > > > > > > > > > > > > > + if (!object && !id) > > > > > > > > + return 0; > > > > > > > > > > > > > > What's the reason behind this? > > > > > > > > > > > > If an fd was passed which is 0 and this was not a program fd, then we don't error > > > > > > out and treat it as if no fd was passed. > > > > > > > > > > Is this new api an opportunity to fix that fd==0? And always treat it as > > > > > valid. Or we have some other constrains elsewhere? > > > > > > > > No. There is nothing to fix. > > > > > > Care to elaborate? Do we want to preserve it for consistency? Or is > > > there some concern with asking people to put relative_fd=-1 when doing > > > the call? > > > I'm fine either way; trying to understand where it's coming from. I > > > remember it was discussed briefly at lsfmmbpf, but don't remember the > > > details.. > > > > 0 is invalid bpf object (prog, map, link). There is nothing to "fix". > > It's more like it's a conditionally invalid bpf object (fd in this case) :-) > > bpf_program__attach_tcx(..., { ..., relative_fd = 0, ... }); // > returns ok and doesn't use relative_fd > dup2(prog_fd, 0); > bpf_program__attach_tcx(..., { ..., relative_fd = 0, ... }); // this > will use prog_fd duped at 0 It shouldn't. I haven't checked the code, but if the patch does that it's a bug. > It seems like it might a bit cleaner to explicitly ask for -1: > bpf_program__attach_tcx(..., { ..., relative_fd = -1, ... }); > > But whatever, it works anyway, and that's how it's been done elsewhere > it seems, so I'm not gonna waste our time on it.