The return value from `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` can change between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and `BPF_CGROUP_RUN_PROG_GETSOCKOPT`. If `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` changes from "false" to "true" between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and `BPF_CGROUP_RUN_PROG_GETSOCKOPT`, `BPF_CGROUP_RUN_PROG_GETSOCKOPT` will receive an -EFAULT from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)` due to `get_user()` had not reached in `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`. Scenario shown as below: `process A` `process B` ----------- ------------ BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN enable CGROUP_GETSOCKOPT BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT) Prevent `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` change between `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and `BPF_CGROUP_RUN_PROG_GETSOCKOPT` by acquiring cgroup_lock. Co-developed-by: Yanghui Li <yanghui.li@xxxxxxxxxxxx> Signed-off-by: Yanghui Li <yanghui.li@xxxxxxxxxxxx> Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@xxxxxxxxxxxx> Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@xxxxxxxxxxxx> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@xxxxxxxxxxxx> --- We have encountered this issue by observing that process A could sometimes get an -EFAULT from getsockopt() during our device boot-up, while another process B triggers the race condition by enabling CGROUP_GETSOCKOPT through bpf syscall at the same time. The race condition is shown below: `process A` `process B` ----------- ------------ BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN bpf syscall (CGROUP_GETSOCKOPT enabled) BPF_CGROUP_RUN_PROG_GETSOCKOPT -> __cgroup_bpf_run_filter_getsockopt (-EFAULT) __cgroup_bpf_run_filter_getsockopt return -EFAULT at the line shown below: if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", ctx.optlen, max_optlen); ret = retval; goto out; } ret = -EFAULT; <== return EFAULT here goto out; } This patch should fix the race but not sure if it introduces any potential side effects or regression. And we wondering if this is a real issue in do_sock_getsockopt or if getsockopt() is designed to expect such race conditions. Should the userspace caller always anticipate an -EFAULT from getsockopt() if another process enables CGROUP_GETSOCKOPT at the same time? Any comment will be appreciated! BTW, I added Chengjui and Yanghui to Co-developed due to we have several discussions on this issue. And we both spend some time on this issue. --- net/socket.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/socket.c b/net/socket.c index fcbdd5bc47ac..e0b2b16fd238 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2370,8 +2370,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level, if (err) return err; - if (!compat) + if (!compat) { + cgroup_lock(); max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); + } ops = READ_ONCE(sock->ops); if (level == SOL_SOCKET) { @@ -2387,10 +2389,12 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level, optlen.user); } - if (!compat) + if (!compat) { err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, optval, optlen, max_optlen, err); + cgroup_unlock(); + } return err; } -- 2.45.2