Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs

Yonghong Song <yonghong.song@xxxxxxxxx> · Wed, 3 Apr 2024 20:18:39 -0700

On 4/3/24 10:47 AM, John Fastabend wrote:
Andrii Nakryiko wrote:
On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:

On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.

Thanks again for working on it.

Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
   include/linux/bpf.h            |   6 +
   include/linux/skmsg.h          |   4 +
   include/uapi/linux/bpf.h       |   5 +
   kernel/bpf/syscall.c           |   4 +
   net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
   tools/include/uapi/linux/bpf.h |   5 +
   6 files changed, 279 insertions(+), 8 deletions(-)

[...]

          psock_set_prog(pprog, prog);
-       return 0;
+       if (link)
+               *plink = link;
+
+out:
+       mutex_unlock(&sockmap_prog_update_mutex);
why this mutex is not per-sockmap?
My thinking is the system probably won't have lots of sockmaps and
sockmap attach/detach/update_prog should not be that frequent. But
I could be wrong.

For my use case at least we have a map per protocol we want to inspect.
So its rather small set <10 I would say. Also they are created once
when the agent starts and when config changes from operator (user decides
to remove/add a parser). Config changing is rather rare. I don't think
this would be paticularly painful in practice now to have a global
lock.

That seems like an even more of an argument to keep mutex per sockmap.
It won't add a lot of memory, but it is conceptually cleaner, as each
sockmap instance (and corresponding links) are completely independent,
even from a locking perspective.

But I can't say I feel very strongly about this.

+       return ret;
   }

[...]

+
+static void sock_map_link_release(struct bpf_link *link)
+{
+       struct sockmap_link *sockmap_link = get_sockmap_link(link);
+
+       mutex_lock(&sockmap_link_mutex);
similar to the above, why is this mutex not sockmap-specific? And I'd
just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
case to keep it simple.
This is to protect sockmap_link->map. They could share the same lock.
Let me double check...
If you keep that global sockmap_prog_update_mutex then I'd probably
reuse that one here for simplicity (and named it a bit more
generically, "sockmap_mutex" or something like that, just like we have
global "cgroup_mutex").
I was leaning to a per map lock, but because a global lock simplifies this
part a bunch I would agree just use a single sockmap_mutex throughout.

If someone has a use case where they want to add/remove maps dynamically
maybe they can let us know what that is. For us, on my todo list, I want
to just remove the map notion and bind progs to socks directly. The
original map idea was for a L7 load balancer, but other than quick hacks
I've never built such a thing nor ran it in production. Maybe someday
I'll find the time.

I am using a single global lock.
  https://lore.kernel.org/bpf/20240404025305.2210999-1-yonghong.song@xxxxxxxxx/
Let us whether it makes sense or not with code.

John, it would be great if you can review the patch set. I am afraid
that I could miss something...

[...]

+       if (old && link->prog != old) {
hm.. even if old matches link->prog, we should unset old and set new
link (link overrides prog attachment, basically), it shouldn't matter
if old == link->prog, unless I'm missing something?

[...]