On Mon, Jul 31, 2023 at 3:02 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote: > > Sorry for the late reply! I just backed from a vacation. > > > On 7/24/23 11:36, Stanislav Fomichev wrote: > > On 07/21, kuifeng@xxxxxxxx wrote: > >> From: Kui-Feng Lee <kuifeng@xxxxxxxx> > >> > >> Enable sleepable cgroup/{get,set}sockopt hooks. > >> > >> The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may > >> received a pointer to the optval in user space instead of a kernel > >> copy. ctx->user_optval and ctx->user_optval_end are the pointers to the > >> begin and end of the user space buffer if receiving a user space > >> buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving > >> a kernel space buffer. > >> > >> A program receives a user space buffer if ctx->flags & > >> BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space > >> buffer. The BPF programs should not read/write from/to a user space buffer > >> dirrectly. It should access the buffer through bpf_copy_from_user() and > >> bpf_copy_to_user() provided in the following patches. > >> > >> Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxxxx> > >> --- > >> include/linux/filter.h | 3 + > >> include/uapi/linux/bpf.h | 9 ++ > >> kernel/bpf/cgroup.c | 189 ++++++++++++++++++++++++++------- > >> kernel/bpf/verifier.c | 7 +- > >> tools/include/uapi/linux/bpf.h | 9 ++ > >> tools/lib/bpf/libbpf.c | 2 + > >> 6 files changed, 176 insertions(+), 43 deletions(-) > >> > >> diff --git a/include/linux/filter.h b/include/linux/filter.h > >> index f69114083ec7..301dd1ba0de1 100644 > >> --- a/include/linux/filter.h > >> +++ b/include/linux/filter.h > >> @@ -1345,6 +1345,9 @@ struct bpf_sockopt_kern { > >> s32 level; > >> s32 optname; > >> s32 optlen; > >> + u32 flags; > >> + u8 *user_optval; > >> + u8 *user_optval_end; > >> /* for retval in struct bpf_cg_run_ctx */ > >> struct task_struct *current_task; > >> /* Temporary "register" for indirect stores to ppos. */ > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index 739c15906a65..b2f81193f97b 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -7135,6 +7135,15 @@ struct bpf_sockopt { > >> __s32 optname; > >> __s32 optlen; > >> __s32 retval; > >> + > >> + __bpf_md_ptr(void *, user_optval); > >> + __bpf_md_ptr(void *, user_optval_end); > > > > Can we re-purpose existing optval/optval_end pointers > > for the sleepable programs? IOW, when the prog is sleepable, > > pass user pointers via optval/optval_end and require the programs > > to do copy_to/from on this buffer (even if the backing pointer might be > > in kernel memory - we can handle that in the kfuncs?). > > > > The fact that the program now needs to look at the flag > > (BPF_SOCKOPT_FLAG_OPTVAL_USER) and decide which buffer to > > use makes the handling even more complicated; and we already have a > > bunch of hairy stuff in these hooks. (or I misreading the change?) > > > > Also, regarding sleepable and non-sleepable co-existence: do we really need > > that? Can we say that all the programs have to be sleepable > > or non-sleepable? Mixing them complicates the sharing of that buffer. > > I considered this approach as well. This is an open question for me. > If we go this way, it means we can not attach a BPF program of a type > if any program of the other type has been installed. If we pass two pointers (kernel copy buffer + real user mem) to the sleepable program, we'll make it even more complicated by inheriting all existing warts of the non-sleepable version :-( IOW, feels like we should try to see if we can have some copy_to/from_user kfuncs in the sleepable version that transparently support either kernel or user memory (and prohibit direct access to user_optval in the sleepable version). And then, if we have one non-sleepable program in the chain, we can fallback everything to the kernel buffer (maybe). This way seems like we can support both versions in the same chain and have a more sane api?