Re: [PATCH bpf-next v5 3/8] bpf: Allow per unit prefill for non-fix-size percpu memory allocator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/20/23 11:16 PM, Yonghong Song wrote:

On 12/20/23 10:26 PM, Hou Tao wrote:
Hi,

On 12/21/2023 1:00 PM, Yonghong Song wrote:
Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
added support for non-fix-size percpu memory allocation.
Such allocation will allocate percpu memory for all buckets on all
cpus and the memory consumption is in the order to quadratic.
For example, let us say, 4 cpus, unit size 16 bytes, so each
cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
Then let us say, 8 cpus with the same unit size, each cpu
has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
So if the number of cpus doubles, the number of memory consumption
will be 4 times. So for a system with large number of cpus, the
memory consumption goes up quickly with quadratic order.
For example, for 4KB percpu allocation, 128 cpus. The total memory
consumption will 4KB * 128 * 128 = 64MB. Things will become
worse if the number of cpus is bigger (e.g., 512, 1024, etc.)

In Commit 41a5db8d8161, the non-fix-size percpu memory allocation is
done in boot time, so for system with large number of cpus, the initial
percpu memory consumption is very visible. For example, for 128 cpu
system, the total percpu memory allocation will be at least
(16 + 32 + 64 + 96 + 128 + 196 + 256 + 512 + 1024 + 2048 + 4096)
   * 128 * 128 = ~138MB.
which is pretty big. It will be even bigger for larger number of cpus.
SNIP
+
  static void drain_mem_cache(struct bpf_mem_cache *c)
  {
      bool percpu = !!c->percpu_size;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f13008d27f35..08f9a49cc11c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12141,20 +12141,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,                   if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
                      return -ENOMEM;
  -                if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
-                    if (!bpf_global_percpu_ma_set) {
-                        mutex_lock(&bpf_percpu_ma_lock);
-                        if (!bpf_global_percpu_ma_set) {
-                            err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
-                            if (!err)
-                                bpf_global_percpu_ma_set = true;
-                        }
- mutex_unlock(&bpf_percpu_ma_lock);
-                        if (err)
-                            return err;
-                    }
-                }
-
                  if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {                       verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
                      return -EINVAL;
@@ -12175,6 +12161,26 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
                      return -EINVAL;
                  }
  +                if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
+                    if (!bpf_global_percpu_ma_set) {
+                        mutex_lock(&bpf_percpu_ma_lock);
+                        if (!bpf_global_percpu_ma_set) {
+                            err = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma);
Because ma->objcg is assigned as get_obj_cgroup_from_current(), so I
think the memory account will be incorrect, right ? Maybe we should pass
objcg to bpf_mem_alloc_percpu_init() explicit. For root memcg, I think
the objcg is NULL.

You are correct. Calling bpf_mem_alloc_percpu_init() in init stage
is exactly the reason to have proper root memcg for objcg. Sorry I missed it.

I remembered I indeed traced it a few days ago and indeed it is NULL.
There are three ways to resolve this:
   1 Just do 'ma->objcg = NULL' unconditionally in bpf_mem_alloc_percpu_init().    2 Second, we can remember objcg = bpf_mem_alloc_percpu_init() at init stage,      e.g., in bpf_global_ma_init() init function (core.c), and later it can
     be used in bpf_mem_alloc_percpu_init().
   3 Still do bpf_mem_alloc_percpu_init() at init stage to initialize ma->objcg      properly. But delay __alloc_percpu_gfp() later when verifier found a call      to bpf_percpu_obj_new(). We could add a call bpf_mem_alloc_percpu_init_caches()
     to do __alloc_percpu_grp().

I prefer option 3, what do you think?

The option 4 below:

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 984c83ecace9..f90989cc9cbc 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -122,6 +122,7 @@ struct bpf_mem_caches {
 };
static const u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096};
+static struct obj_cgroup *objcg_at_init __ro_after_init;
static struct llist_node notrace *__llist_del_first(struct llist_head *head)
 {
@@ -590,7 +591,7 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma)
        ma->percpu = true;
#ifdef CONFIG_MEMCG_KMEM
-       ma->objcg = get_obj_cgroup_from_current();
+       ma->objcg = objcg_at_init;
 #else
        ma->objcg = NULL;
 #endif
@@ -1015,3 +1016,10 @@ void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags)
return !ret ? NULL : ret + LLIST_NODE_SZ;
 }
+
+static int __init find_objcg_at_init(void)
+{
+       objcg_at_init = get_obj_cgroup_from_current();
+       return 0;
+}
+late_initcall(find_objcg_at_init);

It seems this is better?


+                            if (!err)
+                                bpf_global_percpu_ma_set = true;
+                        }
+ mutex_unlock(&bpf_percpu_ma_lock);
+                        if (err)
+                            return err;
+                    }
+
+                    mutex_lock(&bpf_percpu_ma_lock);
+                    err = bpf_mem_alloc_percpu_unit_init(&bpf_global_percpu_ma, ret_t->size);
+                    mutex_unlock(&bpf_percpu_ma_lock);
+                    if (err)
+                        return err;
+                }
+
                  struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id);                   if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {                       if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) {






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux