On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > For any main prog or subprogs, allocate private stack space if requested > by subprog info or main prog. The alignment for private stack is 16 > since maximum stack alignment is 16 for bpf-enabled archs. > > If jit failed, the allocated private stack will be freed in the same > function where the allocation happens. If jit succeeded, e.g., for > x86_64 arch, the allocated private stack is freed in arch specific > implementation of bpf_jit_free(). > > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > --- > arch/x86/net/bpf_jit_comp.c | 1 + > include/linux/bpf.h | 1 + > kernel/bpf/core.c | 19 ++++++++++++++++--- > kernel/bpf/verifier.c | 13 +++++++++++++ > 4 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > index 06b080b61aa5..59d294b8dd67 100644 > --- a/arch/x86/net/bpf_jit_comp.c > +++ b/arch/x86/net/bpf_jit_comp.c > @@ -3544,6 +3544,7 @@ void bpf_jit_free(struct bpf_prog *prog) > prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset(); > hdr = bpf_jit_binary_pack_hdr(prog); > bpf_jit_binary_pack_free(hdr, NULL); > + free_percpu(prog->aux->priv_stack_ptr); > WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog)); > } > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 8db3c5d7404b..8a3ea7440a4a 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1507,6 +1507,7 @@ struct bpf_prog_aux { > u32 max_rdwr_access; > struct btf *attach_btf; > const struct bpf_ctx_arg_aux *ctx_arg_info; > + void __percpu *priv_stack_ptr; > struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */ > struct bpf_prog *dst_prog; > struct bpf_trampoline *dst_trampoline; > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 14d9288441f2..f7a3e93c41e1 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2396,6 +2396,7 @@ static void bpf_prog_select_func(struct bpf_prog *fp) > */ > struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) > { > + void __percpu *priv_stack_ptr = NULL; > /* In case of BPF to BPF calls, verifier did all the prep > * work with regards to JITing, etc. > */ > @@ -2421,11 +2422,23 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) > if (*err) > return fp; > > + if (fp->aux->use_priv_stack && fp->aux->stack_depth) { > + priv_stack_ptr = __alloc_percpu_gfp(fp->aux->stack_depth, 16, GFP_KERNEL); > + if (!priv_stack_ptr) { > + *err = -ENOMEM; > + return fp; > + } > + fp->aux->priv_stack_ptr = priv_stack_ptr; > + } > + > fp = bpf_int_jit_compile(fp); > bpf_prog_jit_attempt_done(fp); > - if (!fp->jited && jit_needed) { > - *err = -ENOTSUPP; > - return fp; > + if (!fp->jited) { > + free_percpu(priv_stack_ptr); > + if (jit_needed) { > + *err = -ENOTSUPP; > + return fp; > + } > } > } else { > *err = bpf_prog_offload_compile(fp); > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index e01b3f0fd314..03ae76d57076 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -20073,6 +20073,7 @@ static int jit_subprogs(struct bpf_verifier_env *env) > { > struct bpf_prog *prog = env->prog, **func, *tmp; > int i, j, subprog_start, subprog_end = 0, len, subprog; > + void __percpu *priv_stack_ptr; > struct bpf_map *map_ptr; > struct bpf_insn *insn; > void *old_bpf_func; > @@ -20169,6 +20170,17 @@ static int jit_subprogs(struct bpf_verifier_env *env) > > func[i]->aux->name[0] = 'F'; > func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; > + > + if (env->subprog_info[i].use_priv_stack && func[i]->aux->stack_depth) { > + priv_stack_ptr = __alloc_percpu_gfp(func[i]->aux->stack_depth, 16, > + GFP_KERNEL); > + if (!priv_stack_ptr) { > + err = -ENOMEM; > + goto out_free; > + } > + func[i]->aux->priv_stack_ptr = priv_stack_ptr; > + } > + > func[i]->jit_requested = 1; > func[i]->blinding_requested = prog->blinding_requested; > func[i]->aux->kfunc_tab = prog->aux->kfunc_tab; > @@ -20201,6 +20213,7 @@ static int jit_subprogs(struct bpf_verifier_env *env) > func[i]->aux->exception_boundary = env->seen_exception; > func[i] = bpf_int_jit_compile(func[i]); > if (!func[i]->jited) { > + free_percpu(func[i]->aux->priv_stack_ptr); > err = -ENOTSUPP; > goto out_free; > } Looks correct from leaks pov, but this is so hard to follow. I still don't like this imbalanced alloc/free. Either both need to be done by core or both by JIT. And JIT is probably better, since in: _alloc_percpu_gfp(func[i]->aux->stack_depth, 16 16 alignment is x86 specific.