On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 4/2/24 10:45 AM, Andrii Nakryiko wrote: > > On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> Add bpf_link support for sk_msg and sk_skb programs. We have an > >> internal request to support bpf_link for sk_msg programs so user > >> space can have a uniform handling with bpf_link based libbpf > >> APIs. Using bpf_link based libbpf API also has a benefit which > >> makes system robust by decoupling prog life cycle and > >> attachment life cycle. > >> > >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > >> --- > >> include/linux/bpf.h | 6 + > >> include/linux/skmsg.h | 4 + > >> include/uapi/linux/bpf.h | 5 + > >> kernel/bpf/syscall.c | 4 + > >> net/core/sock_map.c | 263 ++++++++++++++++++++++++++++++++- > >> tools/include/uapi/linux/bpf.h | 5 + > >> 6 files changed, 279 insertions(+), 8 deletions(-) > >> [...] > >> psock_set_prog(pprog, prog); > >> - return 0; > >> + if (link) > >> + *plink = link; > >> + > >> +out: > >> + mutex_unlock(&sockmap_prog_update_mutex); > > why this mutex is not per-sockmap? > > My thinking is the system probably won't have lots of sockmaps and > sockmap attach/detach/update_prog should not be that frequent. But > I could be wrong. > That seems like an even more of an argument to keep mutex per sockmap. It won't add a lot of memory, but it is conceptually cleaner, as each sockmap instance (and corresponding links) are completely independent, even from a locking perspective. But I can't say I feel very strongly about this. > > > >> + return ret; > >> } > >> [...] > > > >> + > >> +static void sock_map_link_release(struct bpf_link *link) > >> +{ > >> + struct sockmap_link *sockmap_link = get_sockmap_link(link); > >> + > >> + mutex_lock(&sockmap_link_mutex); > > similar to the above, why is this mutex not sockmap-specific? And I'd > > just combine sockmap_link_mutex and sockmap_prog_update_mutex in this > > case to keep it simple. > > This is to protect sockmap_link->map. They could share the same lock. > Let me double check... If you keep that global sockmap_prog_update_mutex then I'd probably reuse that one here for simplicity (and named it a bit more generically, "sockmap_mutex" or something like that, just like we have global "cgroup_mutex"). [...] > >> + if (old && link->prog != old) { > > hm.. even if old matches link->prog, we should unset old and set new > > link (link overrides prog attachment, basically), it shouldn't matter > > if old == link->prog, unless I'm missing something? > > In xdp link (net/core/dev.c), we have > > cur_prog = dev_xdp_prog(dev, mode); > /* can't replace attached prog with link */ > if (link && cur_prog) { > NL_SET_ERR_MSG(extack, "Can't replace active XDP > program with BPF link"); > return -EBUSY; > } > if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) { > NL_SET_ERR_MSG(extack, "Active program does not match > expected"); > return -EEXIST; > } > > if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog > in order to do prog update. > for sockmap prog update, in link_update (syscall.c), the only way > we can get a non-NULL old_prog is with the following: > > if (flags & BPF_F_REPLACE) { > old_prog = bpf_prog_get(attr->link_update.old_prog_fd); > if (IS_ERR(old_prog)) { > ret = PTR_ERR(old_prog); > old_prog = NULL; > goto out_put_progs; > } > } else if (attr->link_update.old_prog_fd) { > ret = -EINVAL; > goto out_put_progs; > } > Basically, we have BPF_F_REPLACE here. > So similar to xdp link, I think we should check old_prog to > be equal to link->prog in order to do link update_prog. ah, ok, that's BPF_F_REPLACE case. See, it's confusing that we have this logic split between multiple places, in dev_xdp_attach() it's a bit more centralized. > > > > >> + ret = -EINVAL; > >> + goto out; > >> + } [...] > >> + > >> + ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type); > >> + if (ret) { > >> + bpf_link_cleanup(&link_primer); > >> + goto out; > >> + } > >> + > >> + bpf_prog_inc(prog); > > if link was created successfully, it "inherits" prog's refcnt, so you > > shouldn't do another bpf_prog_inc()? generic link_create() logic puts > > prog only if this function returns error > > The reason I did this is due to > > static inline void psock_set_prog(struct bpf_prog **pprog, > struct bpf_prog *prog) > { > prog = xchg(pprog, prog); > if (prog) > bpf_prog_put(prog); > } > > You can see when the prog is swapped due to link_update or prog_attach, > its reference count is decremented by 1. This is necessary for prog_attach, > but as you mentioned, indeed, it is not necessary for link-based approach. > Let me see whether I can refactor code to make it easy not to increase > reference count of prog here. > ah, ok, its another sockmap-specific convention, np > > > > >> + > >> + return bpf_link_settle(&link_primer); > >> + > >> +out: > >> + bpf_map_put_with_uref(map); > >> + return ret; > >> +} > >> + > >> static int sock_map_iter_attach_target(struct bpf_prog *prog, > >> union bpf_iter_link_info *linfo, > >> struct bpf_iter_aux_info *aux) > >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > >> index 9585f5345353..31660c3ffc01 100644 > >> --- a/tools/include/uapi/linux/bpf.h > >> +++ b/tools/include/uapi/linux/bpf.h > >> @@ -1135,6 +1135,7 @@ enum bpf_link_type { > >> BPF_LINK_TYPE_TCX = 11, > >> BPF_LINK_TYPE_UPROBE_MULTI = 12, > >> BPF_LINK_TYPE_NETKIT = 13, > >> + BPF_LINK_TYPE_SOCKMAP = 14, > >> __MAX_BPF_LINK_TYPE, > >> }; > >> > >> @@ -6720,6 +6721,10 @@ struct bpf_link_info { > >> __u32 ifindex; > >> __u32 attach_type; > >> } netkit; > >> + struct { > >> + __u32 map_id; > >> + __u32 attach_type; > >> + } sockmap; > >> }; > >> } __attribute__((aligned(8))); > >> > >> -- > >> 2.43.0 > >>