On Mon, Aug 19, 2024 at 10:27 AM Tze-nan Wu <Tze-nan.Wu@xxxxxxxxxxxx> wrote: > > The return value from `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` can change > between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and > `BPF_CGROUP_RUN_PROG_GETSOCKOPT`. > > If `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` changes from "false" to > "true" > between the invocations of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and > `BPF_CGROUP_RUN_PROG_GETSOCKOPT`, > `BPF_CGROUP_RUN_PROG_GETSOCKOPT` will receive an -EFAULT from > `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)` due to `get_user()` > had not reached in `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`. > > Scenario shown as below: > > `process A` `process B` > ----------- ------------ > BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN > enable CGROUP_GETSOCKOPT > BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT) > > Prevent `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` change between > `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` and `BPF_CGROUP_RUN_PROG_GETSOCKOPT` > by acquiring cgroup_lock. > > Co-developed-by: Yanghui Li <yanghui.li@xxxxxxxxxxxx> > Signed-off-by: Yanghui Li <yanghui.li@xxxxxxxxxxxx> > Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@xxxxxxxxxxxx> > Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@xxxxxxxxxxxx> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@xxxxxxxxxxxx> > > --- > > We have encountered this issue by observing that process A could sometimes > get an -EFAULT from getsockopt() during our device boot-up, while another > process B triggers the race condition by enabling CGROUP_GETSOCKOPT > through bpf syscall at the same time. > > The race condition is shown below: > > `process A` `process B` > ----------- ------------ > BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN > > bpf syscall > (CGROUP_GETSOCKOPT enabled) > > BPF_CGROUP_RUN_PROG_GETSOCKOPT > -> __cgroup_bpf_run_filter_getsockopt > (-EFAULT) > > __cgroup_bpf_run_filter_getsockopt return -EFAULT at the line shown below: > if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { > if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { > pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", > ctx.optlen, max_optlen); > ret = retval; > goto out; > } > ret = -EFAULT; <== return EFAULT here > goto out; > } > > This patch should fix the race but not sure if it introduces any potential > side effects or regression. > > And we wondering if this is a real issue in do_sock_getsockopt or if > getsockopt() is designed to expect such race conditions. > Should the userspace caller always anticipate an -EFAULT from getsockopt() > if another process enables CGROUP_GETSOCKOPT at the same time? > > Any comment will be appreciated! > > BTW, I added Chengjui and Yanghui to Co-developed due to we have several > discussions on this issue. And we both spend some time on this issue. > > --- > net/socket.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/net/socket.c b/net/socket.c > index fcbdd5bc47ac..e0b2b16fd238 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -2370,8 +2370,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level, > if (err) > return err; > > - if (!compat) > + if (!compat) { > + cgroup_lock(); > max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); > + } > Acquiring cgroup_lock mutex in socket getsockopt() fast path ? There is no way we can accept such a patch, please come up with a reasonable patch. cgroup_bpf_enabled() should probably be used once.