From: Kui-Feng Lee <kuifeng@xxxxxxxx> Make BPF programs attached on cgroup/{get,set}sockopt hooks sleepable and able to call bpf_copy_from_user() and bpf_copy_to_user(), a new kfunc. The Issue with CGroup {get,set}sockopt Hooks ============================================ Calling {get,set}sockopt from user space, optval is a pointer to a buffer. The format of the buffer depends on the level and optname, and its size is specified by optlen. The buffer is used by user space programs to pass values to setsockopt and retrieve values from getsockopt. The problem is that BPF programs protected by RCU read lock cannot access the buffers located in user space. This is because these programs are non-sleepable and using copy_from_user() or copy_to_user() to access user space memory can result in paging. The kernel makes a copy of the buffer specified by optval and optlen in kernel space before passing it to the cgroup {get,set}sockopt hooks. After the hooks are executed, the content of the buffer in kernel space is copied to user space if necessary. Programs may send a significant amount of data, stored in buffer indicated by optval, to the kernel. One example is iptables, which can send several megabytes to the kernel. However, BPF programs on the hooks can only see up to the first PAGE_SIZE bytes of the buffer. The optlen value that BPF programs observe may appear to be PAGE_SIZE, but in reality, it is larger than that. On the other hand, the value of optlen represents the amount of data retrieved by getsockopt(). Additionally, both the buffer content and optlen can be modified by BPF programs. Kernel may wrongly modify the value of optlen returned to user space to PAGE_SIZE. This can happen because the kernel cannot distinguish if the value was set by BPF programs or by the kernel itself. To fix it, we perform various hacks; for example, the commit d8fe449a9c51 ("bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE") and the commit 29ebbba7d461 ("bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen"). Make CGroup {get,set}sockopt Hooks Sleepable ============================================ The long term solution is to make these hooks sleepable to enable BPF programs call copy_from_user() and copy_to_user(), a.k.a. bpf_copy_from_user() and bpf_copy_to_user(). It prevents manipulation of optval and optlen values, and allows BPF programs to access the complete contents of the buffer referenced by optval. Kui-Feng Lee (5): bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt. bpf: Provide bpf_copy_from_user() and bpf_copy_to_user(). bpf: Add a new dynptr type for CGRUP_SOCKOPT. bpf: Prevent BPF programs from access the buffer pointed by user_optval. bpf: Add test cases for sleepable BPF programs of the CGROUP_SOCKOPT type. include/linux/bpf.h | 7 +- include/linux/filter.h | 3 + include/uapi/linux/bpf.h | 11 + kernel/bpf/btf.c | 3 + kernel/bpf/cgroup.c | 196 +++++++++--- kernel/bpf/helpers.c | 104 ++++++ kernel/bpf/verifier.c | 116 ++++--- tools/include/uapi/linux/bpf.h | 11 + tools/lib/bpf/libbpf.c | 2 + .../testing/selftests/bpf/bpf_experimental.h | 27 ++ tools/testing/selftests/bpf/bpf_kfuncs.h | 30 ++ .../selftests/bpf/prog_tests/sockopt_sk.c | 34 +- .../testing/selftests/bpf/progs/sockopt_sk.c | 299 ++++++++++++++++++ .../selftests/bpf/verifier/sleepable.c | 2 +- 14 files changed, 763 insertions(+), 82 deletions(-) -- 2.34.1