Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 3 Apr 2024 09:43:41 -0700

On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> > On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >> Add bpf_link support for sk_msg and sk_skb programs. We have an
> >> internal request to support bpf_link for sk_msg programs so user
> >> space can have a uniform handling with bpf_link based libbpf
> >> APIs. Using bpf_link based libbpf API also has a benefit which
> >> makes system robust by decoupling prog life cycle and
> >> attachment life cycle.
> >>
> >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> >> ---
> >>   include/linux/bpf.h            |   6 +
> >>   include/linux/skmsg.h          |   4 +
> >>   include/uapi/linux/bpf.h       |   5 +
> >>   kernel/bpf/syscall.c           |   4 +
> >>   net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
> >>   tools/include/uapi/linux/bpf.h |   5 +
> >>   6 files changed, 279 insertions(+), 8 deletions(-)
> >>

[...]

> >>          psock_set_prog(pprog, prog);
> >> -       return 0;
> >> +       if (link)
> >> +               *plink = link;
> >> +
> >> +out:
> >> +       mutex_unlock(&sockmap_prog_update_mutex);
> > why this mutex is not per-sockmap?
>
> My thinking is the system probably won't have lots of sockmaps and
> sockmap attach/detach/update_prog should not be that frequent. But
> I could be wrong.
>

That seems like an even more of an argument to keep mutex per sockmap.
It won't add a lot of memory, but it is conceptually cleaner, as each
sockmap instance (and corresponding links) are completely independent,
even from a locking perspective.

But I can't say I feel very strongly about this.

> >
> >> +       return ret;
> >>   }
> >>

[...]

> >
> >> +
> >> +static void sock_map_link_release(struct bpf_link *link)
> >> +{
> >> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
> >> +
> >> +       mutex_lock(&sockmap_link_mutex);
> > similar to the above, why is this mutex not sockmap-specific? And I'd
> > just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> > case to keep it simple.
>
> This is to protect sockmap_link->map. They could share the same lock.
> Let me double check...

If you keep that global sockmap_prog_update_mutex then I'd probably
reuse that one here for simplicity (and named it a bit more
generically, "sockmap_mutex" or something like that, just like we have
global "cgroup_mutex").

[...]

> >> +       if (old && link->prog != old) {
> > hm.. even if old matches link->prog, we should unset old and set new
> > link (link overrides prog attachment, basically), it shouldn't matter
> > if old == link->prog, unless I'm missing something?
>
> In xdp link (net/core/dev.c), we have
>
>          cur_prog = dev_xdp_prog(dev, mode);
>          /* can't replace attached prog with link */
>          if (link && cur_prog) {
>                  NL_SET_ERR_MSG(extack, "Can't replace active XDP
> program with BPF link");
>                  return -EBUSY;
>          }
>          if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) {
>                  NL_SET_ERR_MSG(extack, "Active program does not match
> expected");
>                  return -EEXIST;
>          }
>
> if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog
> in order to do prog update.
> for sockmap prog update, in link_update (syscall.c), the only way
> we can get a non-NULL old_prog is with the following:
>
>          if (flags & BPF_F_REPLACE) {
>                  old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
>                  if (IS_ERR(old_prog)) {
>                          ret = PTR_ERR(old_prog);
>                          old_prog = NULL;
>                          goto out_put_progs;
>                  }
>          } else if (attr->link_update.old_prog_fd) {
>                  ret = -EINVAL;
>                  goto out_put_progs;
>          }
> Basically, we have BPF_F_REPLACE here.
> So similar to xdp link, I think we should check old_prog to
> be equal to link->prog in order to do link update_prog.

ah, ok, that's BPF_F_REPLACE case. See, it's confusing that we have
this logic split between multiple places, in dev_xdp_attach() it's a
bit more centralized.

>
> >
> >> +               ret = -EINVAL;
> >> +               goto out;
> >> +       }

[...]

> >> +
> >> +       ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
> >> +       if (ret) {
> >> +               bpf_link_cleanup(&link_primer);
> >> +               goto out;
> >> +       }
> >> +
> >> +       bpf_prog_inc(prog);
> > if link was created successfully, it "inherits" prog's refcnt, so you
> > shouldn't do another bpf_prog_inc()? generic link_create() logic puts
> > prog only if this function returns error
>
> The reason I did this is due to
>
> static inline void psock_set_prog(struct bpf_prog **pprog,
>                                    struct bpf_prog *prog)
> {
>          prog = xchg(pprog, prog);
>          if (prog)
>                  bpf_prog_put(prog);
> }
>
> You can see when the prog is swapped due to link_update or prog_attach,
> its reference count is decremented by 1. This is necessary for prog_attach,
> but as you mentioned, indeed, it is not necessary for link-based approach.
> Let me see whether I can refactor code to make it easy not to increase
> reference count of prog here.
>

ah, ok, its another sockmap-specific convention, np

>
> >
> >> +
> >> +       return bpf_link_settle(&link_primer);
> >> +
> >> +out:
> >> +       bpf_map_put_with_uref(map);
> >> +       return ret;
> >> +}
> >> +
> >>   static int sock_map_iter_attach_target(struct bpf_prog *prog,
> >>                                         union bpf_iter_link_info *linfo,
> >>                                         struct bpf_iter_aux_info *aux)
> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> index 9585f5345353..31660c3ffc01 100644
> >> --- a/tools/include/uapi/linux/bpf.h
> >> +++ b/tools/include/uapi/linux/bpf.h
> >> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
> >>          BPF_LINK_TYPE_TCX = 11,
> >>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
> >>          BPF_LINK_TYPE_NETKIT = 13,
> >> +       BPF_LINK_TYPE_SOCKMAP = 14,
> >>          __MAX_BPF_LINK_TYPE,
> >>   };
> >>
> >> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
> >>                          __u32 ifindex;
> >>                          __u32 attach_type;
> >>                  } netkit;
> >> +               struct {
> >> +                       __u32 map_id;
> >> +                       __u32 attach_type;
> >> +               } sockmap;
> >>          };
> >>   } __attribute__((aligned(8)));
> >>
> >> --
> >> 2.43.0
> >>