Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/11/23 8:48 PM, Andrii Nakryiko wrote:
On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
[...]
+static inline int bpf_mprog_max(void)
+{
+       return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
+}

so we can only add BPF_MPROG_MAX - 1 progs, right? I presume the last
entry is presumed to be always NULL, right?

Correct.

+static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
+{
+       int total = entry->parent->count;
+
+       WARN_ON_ONCE(total > bpf_mprog_max());
+       return total;
+}
+

[...]

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1d3892168d32..1bea2eb912cd 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
  obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
  obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
-obj-$(CONFIG_BPF_SYSCALL) += disasm.o
+obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
  obj-$(CONFIG_BPF_JIT) += trampoline.o
  obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
  obj-$(CONFIG_BPF_JIT) += dispatcher.o
diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
new file mode 100644
index 000000000000..1c4fcde74969
--- /dev/null
+++ b/kernel/bpf/mprog.c
@@ -0,0 +1,427 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+
+static int bpf_mprog_link(struct bpf_tuple *tuple,
+                         u32 object, u32 flags,

so I tried to get used to this "object" notation, but I think it's
still awkwards and keeps me asking "what is this really" every single
time I read this. I wonder if something like "fd_or_id" as a name
would make it more obvious?

Ok, fixed it in the v5.

[...]
+       struct bpf_mprog_fp *fp, *fpp;
+       struct bpf_mprog_entry *peer;
+
+       peer = bpf_mprog_peer(entry);
+       bpf_mprog_entry_clear(peer);
+       if (idx < 0) {
+               bpf_mprog_read_fp(peer, j, &fpp);
+               bpf_mprog_write_fp(fpp, ntuple);
+               bpf_mprog_write_cp(&cpp[j], ntuple);
+               j++;
+       }
+       for (i = 0; i <= total; i++) {
+               bpf_mprog_read_fp(peer, j, &fpp);
+               if (idx == i && (flags & BPF_F_AFTER)) {
+                       bpf_mprog_write(fpp, &cpp[j], ntuple);
+                       j++;
+                       bpf_mprog_read_fp(peer, j, &fpp);
+               }
+               if (i < total) {
+                       bpf_mprog_read(entry, i, &fp, &cp);
+                       bpf_mprog_copy(fpp, &cpp[j], fp, cp);
+                       j++;
+               }
+               if (idx == i && (flags & BPF_F_BEFORE)) {
+                       bpf_mprog_read_fp(peer, j, &fpp);
+                       bpf_mprog_write(fpp, &cpp[j], ntuple);
+                       j++;
+               }
+       }

sorry if I miss some subtle point, but I wonder why this is so
complicated? I think this choice of idx == -1 meaning prepend is
leading to this complication. It's not also clear why there is this
BPF_F_AFTER vs BPF_F_BEFORE distinction when we already determined a
position where new program has to be inserted (so after or before
should be irrelevant).

Please let me know why the below doesn't work.

Let's define that idx is the position where new prog/link tuple has to
be inserted. It can be in the range [0, N], where N is number of
programs currently in the mprog_peer. Note that N is inclusive above.

The algorithm for insertion is simple: everything currently at
entry->fp_items[idx] and after gets shifted. And we can do it with a
simple memmove:

memmove(peer->fp_items + idx + 1, peer->fp_iters + idx,
(bpf_mprog_total(entry) - idx) * sizeof(struct bpf_mprof_fp));
/* similar memmove for cp_items/cpp array, of course */
/* now set new prog at peer->fp_items[idx] */

The above should replace entire above for loop and that extra if
before the loop. And it should work for corner cases:

   - idx == 0 (prepend), will shift everything to the right, and put
new prog at position 0. Exactly what we wanted.
   - idx == N (append), will shift nothing (that memmov should be a
no-op because size is zero, total == idx == N)

We just need to make sure that the above shift won't overwrite the
very last NULL. So bpf_mprog_total() should be < BPF_MPROG_MAX - 2
before all this.

Seems as simple as that, is there any complication I skimmed over?
[...]

+static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
+                           struct bpf_tuple *dtuple, int idx)
+{
+       int i = 0, j, ret, total = bpf_mprog_total(entry);
+       struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};
+       struct bpf_mprog_fp *fp, *fpp;
+       struct bpf_mprog_entry *peer;
+
+       ret = bpf_mprog_tuple_confirm(entry, dtuple, idx);
+       if (ret)
+               return ret;
+       peer = bpf_mprog_peer(entry);
+       bpf_mprog_entry_clear(peer);
+       if (idx < 0)
+               i++;
+       if (idx == total)
+               total--;
+       for (j = 0; i < total; i++) {
+               if (idx == i)
+                       continue;
+               bpf_mprog_read_fp(peer, j, &fpp);
+               bpf_mprog_read(entry, i, &fp, &cp);
+               bpf_mprog_copy(fpp, &cpp[j], fp, cp);
+               j++;
+       }
+       bpf_mprog_commit_cp(peer, cpp);
+       bpf_mprog_dec(peer);
+       bpf_mprog_mark_ref(peer, dtuple);
+       return bpf_mprog_total(peer) ?
+              BPF_MPROG_SWAP : BPF_MPROG_FREE;

for delete it's also a bit unclear to me. We are deleting some
specific spot, so idx should be a valid [0, N) value, no? Then why the
bpf_mprog_tuple_confirm() has this special <= first and idx >= last
handling?

Deletion should be similar to instertion, just the shift is in the
other direction. And then setting NULLs at N-1 position to ensure
proper NULL termination of fp array.

Agree, the naming was suboptimal and I adapted this slightly in v5.
It's picking the elements when no deletion fd was selected, but rather
delete from front/back or relative to some element, so it needs to
fetch the prog.

[...]

and then here just have special casing for -ERANGE, and otherwise
treat anything else negative as error

tidx = bpf_mprog_pos_exact(entry, &rtuple);
/* and adjust +1 for BPF_F_AFTER */
if (tidx >= 0)
     tidx += 1;
if (idx != -ERANGE && tidx != idx) {
     ret = tidx < 0 ? tidx : -EDOM;
     goto out;
}
idx = tidx;

This looks much less intuitive to me given replace and delete case need
exact position, just not the relative insertion. I reworked this also with
the memmove in v5, but kept the more obvious _exact/before/after ones.

Thanks a lot for the feedback!

+       }
+       if (idx < -1) {
+               if (rtuple.prog || flags) {
+                       ret = -EINVAL;
+                       goto out;
+               }
+               idx = bpf_mprog_total(entry);
+               flags = BPF_F_AFTER;
+       }
+       if (idx >= bpf_mprog_max()) {
+               ret = -EDOM;
+               goto out;
+       }
+       if (flags & BPF_F_REPLACE)
+               ret = bpf_mprog_replace(entry, &ntuple, idx);
+       else
+               ret = bpf_mprog_insert(entry, &ntuple, idx, flags);
+out:
+       bpf_mprog_tuple_put(&rtuple);
+       return ret;
+}
+

[...]






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux