Re: [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On 2/23/24 09:36, Kui-Feng Lee wrote:


On 2/22/24 21:25, Martin KaFai Lau wrote:
On 2/22/24 7:01 PM, Kui-Feng Lee wrote:



On 2/22/24 18:16, Martin KaFai Lau wrote:
On 2/22/24 5:35 PM, Kui-Feng Lee wrote:


On 2/22/24 16:33, Martin KaFai Lau wrote:
On 2/21/24 2:59 PM, thinker.li@xxxxxxxxx wrote:
@@ -531,10 +567,10 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
      const struct btf_type *module_type;
      const struct btf_member *member;
      const struct btf_type *t = st_ops_desc->type;
+    void *image = NULL, *image_end = NULL;
      struct bpf_tramp_links *tlinks;
      void *udata, *kdata;
      int prog_fd, err;
-    void *image, *image_end;
      u32 i;
      if (flags)
@@ -573,15 +609,14 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
      udata = &uvalue->data;
      kdata = &kvalue->data;
-    image = st_map->image;
-    image_end = st_map->image + PAGE_SIZE;
      module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]);
      for_each_member(i, t, member) {
          const struct btf_type *mtype, *ptype;
          struct bpf_prog *prog;
          struct bpf_tramp_link *link;
-        u32 moff;
+        u32 moff, tflags;
+        int tsize;
          moff = __btf_member_bit_offset(t, member) / 8;
          ptype = btf_type_resolve_ptr(st_map->btf, member->type, NULL); @@ -653,10 +688,38 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
                    &bpf_struct_ops_link_lops, prog);
          st_map->links[i] = &link->link;
-        err = bpf_struct_ops_prepare_trampoline(tlinks, link,
-                            &st_ops->func_models[i],
-                            *(void **)(st_ops->cfi_stubs + moff),
-                            image, image_end);
+        tflags = BPF_TRAMP_F_INDIRECT;
+        if (st_ops->func_models[i].ret_size > 0)
+            tflags |= BPF_TRAMP_F_RET_FENTRY_RET;
+
+        /* Compute the size of the trampoline */
+        tlinks[BPF_TRAMP_FENTRY].links[0] = link;
+        tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
+        tsize = arch_bpf_trampoline_size(&st_ops->func_models[i],
+                         tflags, tlinks, NULL);
+        if (tsize < 0) {
+            err = tsize;
+            goto reset_unlock;
+        }
+
+        /* Allocate pages */
+        if (tsize > (unsigned long)image_end - (unsigned long)image) {
+            if (tsize > PAGE_SIZE) {
+                err = -E2BIG;
+                goto reset_unlock;
+            }
+            image = bpf_struct_ops_map_inc_image(st_map);
+            if (IS_ERR(image)) {
+                err = PTR_ERR(image);
+                goto reset_unlock;
+            }
+            image_end = image + PAGE_SIZE;
+        }
+
+        err = arch_prepare_bpf_trampoline(NULL, image, image_end,
+                          &st_ops->func_models[i],
+                          tflags, tlinks,
+                          *(void **)(st_ops->cfi_stubs + moff));

I don't prefer to copy the BPF_TRAMP_F_* setting on tflags, tlinks, and the arch_*_trampoline_*() logic from bpf_struct_ops_prepare_trampoline() which is used by the bpf_dummy_ops for testing also. Considering struct_ops supports kernel module now, in the future, it is better to move bpf_dummy_ops out to the bpf_testmod somehow and avoid its bpf_struct_ops_prepare_trampoline() usage. For now, it is still better to keep bpf_struct_ops_prepare_trampoline() to be reusable by both.

Have you thought about the earlier suggestion in v1 to do arch_alloc_bpf_trampoline() in bpf_struct_ops_prepare_trampoline() instead of copying codes from bpf_struct_ops_prepare_trampoline() to bpf_struct_ops_map_update_elem()?

Something like this (untested code):

void *bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
                     struct bpf_tramp_link *link,
                     const struct btf_func_model *model,
                     void *stub_func, void *image,
                     u32 *image_off,
                     bool allow_alloc)

To be a little more specific, the changes in bpf_struct_ops_map_update_elem()
could be mostly like this (untested):

         ret_image = bpf_struct_ops_prepare_trampoline(tlinks, link,
                                   &st_ops->func_models[i],
                                   *(void **)(st_ops->cfi_stubs + moff),
                                   image, &image_off,
                                   st_map->image_pages_cnt < MAX_TRAMP_IMAGE_PAGES);
         if (IS_ERR(ret_image))
             goto reset_unlock;

         if (image != ret_image) {
             image = ret_image;
             st_map->image_pages[st_map->image_pages_cnt++] = image;
         }


What I don't like is the memory management code was in two named
functions, bpf_struct_ops_map_free_image() and
bpf_struct_ops_map_inc_image().

bpf_struct_ops_map_inc_image() is not needed.

Now, it falls apart.  Allocate in one place, keep accounting in another
place, and free yet at the 3rd place.



How about pass a struct bpf_struct_ops_map to
bpf_struct_ops_prepare_trampoline(). If the pointer of struct
bpf_struct_ops_map is not NULL, try to allocate new pages for the map?

For example,

static int
_bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,

                                    struct bpf_tramp_link *link,

                                    const struct btf_func_model *model,

                                    void *stub_func, void *image,
                                    void *image_end,
                                    struct bpf_struct_ops_map *st_map)
{

...

     if (!image || size > PAGE_SIZE - *image_off) {
         if (!st_map)

Why only limit to st_map != NULL?

arch_alloc_bpf_trampoline() is also called in bpf_dummy_ops.
If bpf_struct_ops_prepare_trampoline() can do the alloc, it may as well simplify bpf_dummy_ops and just use bpf_struct_ops_prepare_trampoline() to alloc.


Yes, it can save a few lines from bpf_dummy_ops. But, bpf_dummy_ops
still need to free the memory. And, it doesn't pair alloc and free in
the same function. Usually, paring alloc and free in the same function,
nearby, or the same module is easier to understand.

It is not only about saving a few lines. It just does not make sense to
add all this complexity and another "_"bpf_struct_ops_prepare_trampoline() variant to make it conform to the alloc/free pair idea. To be clear, I don't

To be clear, we are not talking computation or memory complexity here.
I consider the complexity in another way. When I look at the code of
bpf_dummy_ops, and see it free the memory at the very end of a function.
I have to guess who allocate the memory by looking around without a
clear sign or hint if we move the allocation to
bpf_struct_ops_prepare_trampoline(). It is a source of complexity.
Very often, a duplication is much more simple and easy to understand.
Especially, when the duplication is in a very well know/recognized
pattern. Here will create a unusual way to replace a well recognized one
to simplify the code.

My reason of duplicating the code from
bpf_struct_ops_prepare_trampoline() was we don't need
bpf_struct_ops_prepare_trampoline() in future if we were going to move
bpf_dummy_ops out. But, just like you said, we still have bpf_dummy_ops
now, so it is a good trade of to move memory allocation into
bpf_struct_ops_prepare_trampoline() to avoid the duplication the code
about flags and tlinks. But, the trade off we are talking here goes to
an opposite way.

By the way, I am not insisting on these tiny details. I am just trying
to explain what I don't like here.

One thing I forgot to mention is that bpf_dummy_ops has to call
bpf_jit_uncharge_modmem(PAGE_SIZE) as well. The other option is to move
bpf_jit_charge_modmem() out of bpf_struct_ops_prepare_trampoline(),
meaning bpf_struct_ops_map_update_elem() should handle the case that the
allocation in bpf_struct_ops_prepare_trampoline() successes, but
bpf_jit_charge_modmem() fails.



see alloc/free pair is a must have in all cases. There are many situations that
non-alloc named function calls multiple kmalloc() in different places and
one xyz_free() releases everything. Even alloc/free is really preferred,
there has to be a better way or else need to make a trade off.

I suggested the high level ideal on making
bpf_struct_ops_prepare_trampoline() to allocate page. It can sure add a
bpf_struct_ops_free_trampoline() if you see fit to make it match with
bpf_struct_ops_prepare_trampoline() as alloc/free pair, for example,

void bpf_struct_ops_free_trampoline(void *image)
{
     bpf_jit_uncharge_modmem(PAGE_SIZE);
     arch_free_bpf_trampoline(image, PAGE_SIZE);
}

and make bpf_struct_ops_map_free_image() to use
bpf_struct_ops_free_trampoline()

static void bpf_struct_ops_map_free_image(struct bpf_struct_ops_map *st_map)
{
     int i;

     for (i = 0; i < st_map->image_pages_cnt; i++) {
         bpf_struct_ops_free_trampoline(st_map->image_pages[i]);
         st_map->image_pages[i] = NULL;
     }
     st_map->image_pages_cnt = 0;
}

Then it should work like alloc/free pair.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux