Re: [PATCH bpf] bpf: Adjust insufficient default bpf_jit_limit

Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> · Tue, 21 Mar 2023 10:09:25 -0700

From:   Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Date:   Mon, 20 Mar 2023 15:37:25 +0100
> We've seen recent AWS EKS (Kubernetes) user reports like the following:
> 
>   After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
>   clusters after a few days a number of the nodes have containers stuck
>   in ContainerCreating state or liveness/readiness probes reporting the
>   following error:
> 
>     Readiness probe errored: rpc error: code = Unknown desc = failed to
>     exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
>     OCI runtime exec failed: exec failed: unable to start container process:
>     unable to init seccomp: error loading seccomp filter into kernel:
>     error loading seccomp filter: errno 524: unknown
> 
>   However, we had not been seeing this issue on previous AMIs and it only
>   started to occur on v20230217 (following the upgrade from kernel 5.4 to
>   5.10) with no other changes to the underlying cluster or workloads.
> 
>   We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
>   which helped to immediately allow containers to be created and probes to
>   execute but after approximately a day the issue returned and the value
>   returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
>   was steadily increasing.
> 
> I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
> their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
> seccomp BPF and native (e)BPF programs, and the behavior all looks sane
> and expected, that is nothing "leaking" from an upstream perspective.
> 
> The bpf_jit_limit knob was originally added in order to avoid a situation
> where unprivileged applications loading BPF programs (e.g. seccomp BPF
> policies) consuming all the module memory space via BPF JIT such that loading
> of kernel modules would be prevented. The default limit was defined back in
> 2018 and while good enough back then, we are generally seeing far more BPF
> consumers today.
> 
> Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
> module memory space to better reflect today's needs and avoid more users
> running into potentially hard to debug issues.
> 
> Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
> Reported-by: Stephen Haynes <sh@xxxxxxxx>
> Reported-by: Lefteris Alexakis <lefteris.alexakis@xxxxxxx>
> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>

Hi Daniel,

Thanks for tha patch.

Reviewed-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>

> Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
> Link: https://github.com/awslabs/amazon-eks-ami/issues/1219

I'm investigating these issues with EKS folks.  On the issue 1179, the
customer was using our 5.4 kernel, and on 1219, 5.10 kernel.

Then, I found my memleak fix commit a1140cb215fa ("seccomp: Move
copy_seccomp() to no failure path.") was not backported to upstream 5.10
stable trees.  We'll test if the issue can be reproduced with/without
the fix.

Anyway, I'll backport this patch to our all trees.

Thanks,
Kuniyuki

> ---
>  kernel/bpf/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b297e9f60ca1..e2d256c82072 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -972,7 +972,7 @@ static int __init bpf_jit_charge_init(void)
>  {
>  	/* Only used as heuristic here to derive limit. */
>  	bpf_jit_limit_max = bpf_jit_alloc_exec_limit();
> -	bpf_jit_limit = min_t(u64, round_up(bpf_jit_limit_max >> 2,
> +	bpf_jit_limit = min_t(u64, round_up(bpf_jit_limit_max >> 1,
>  					    PAGE_SIZE), LONG_MAX);
>  	return 0;
>  }
> -- 
> 2.27.0