Re: [RFC bpf-next 1/5] bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/1/23 11:08 AM, Stanislav Fomichev wrote:
On Tue, Aug 1, 2023 at 10:31 AM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:



On 7/31/23 16:35, Stanislav Fomichev wrote:
On Mon, Jul 31, 2023 at 3:02 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:

Sorry for the late reply! I just backed from a vacation.


On 7/24/23 11:36, Stanislav Fomichev wrote:
On 07/21, kuifeng@xxxxxxxx wrote:
From: Kui-Feng Lee <kuifeng@xxxxxxxx>

Enable sleepable cgroup/{get,set}sockopt hooks.

The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may
received a pointer to the optval in user space instead of a kernel
copy. ctx->user_optval and ctx->user_optval_end are the pointers to the
begin and end of the user space buffer if receiving a user space
buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving
a kernel space buffer.

A program receives a user space buffer if ctx->flags &
BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space
buffer.  The BPF programs should not read/write from/to a user space buffer
dirrectly.  It should access the buffer through bpf_copy_from_user() and
bpf_copy_to_user() provided in the following patches.

Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxxxx>
---
    include/linux/filter.h         |   3 +
    include/uapi/linux/bpf.h       |   9 ++
    kernel/bpf/cgroup.c            | 189 ++++++++++++++++++++++++++-------
    kernel/bpf/verifier.c          |   7 +-
    tools/include/uapi/linux/bpf.h |   9 ++
    tools/lib/bpf/libbpf.c         |   2 +
    6 files changed, 176 insertions(+), 43 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f69114083ec7..301dd1ba0de1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1345,6 +1345,9 @@ struct bpf_sockopt_kern {
       s32             level;
       s32             optname;
       s32             optlen;
+    u32             flags;
+    u8              *user_optval;
+    u8              *user_optval_end;
       /* for retval in struct bpf_cg_run_ctx */
       struct task_struct *current_task;
       /* Temporary "register" for indirect stores to ppos. */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 739c15906a65..b2f81193f97b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7135,6 +7135,15 @@ struct bpf_sockopt {
       __s32   optname;
       __s32   optlen;
       __s32   retval;
+
+    __bpf_md_ptr(void *, user_optval);
+    __bpf_md_ptr(void *, user_optval_end);

Can we re-purpose existing optval/optval_end pointers
for the sleepable programs? IOW, when the prog is sleepable,
pass user pointers via optval/optval_end and require the programs
to do copy_to/from on this buffer (even if the backing pointer might be
in kernel memory - we can handle that in the kfuncs?).

The fact that the program now needs to look at the flag
(BPF_SOCKOPT_FLAG_OPTVAL_USER) and decide which buffer to
use makes the handling even more complicated; and we already have a
bunch of hairy stuff in these hooks. (or I misreading the change?)

Also, regarding sleepable and non-sleepable co-existence: do we really need
that? Can we say that all the programs have to be sleepable
or non-sleepable? Mixing them complicates the sharing of that buffer.

I considered this approach as well. This is an open question for me.
If we go this way, it means we can not attach a BPF program of a type
if any program of the other type has been installed.

If we pass two pointers (kernel copy buffer + real user mem) to the
sleepable program, we'll make it even more complicated by inheriting
all existing warts of the non-sleepable version :-( >>> IOW, feels like we should try to see if we can have some
copy_to/from_user kfuncs in the sleepable version that transparently
support either kernel or user memory (and prohibit direct access to
user_optval in the sleepable version).

From looking at patch 5 selftest, I also think exposing user_optval_* and flags is not ideal. For example, correct me if I am wrong, in patch 3, dynptr is only used for setsockopt to alloc. Intuitively, when developing a bpf prog, I would expect using bpf_dynptr_write() to write a new sockopt and then done. However, it still needs to "install" (by calling bpf_sockopt_install_optval). I think the "install" part is leaking too much internal details.

Beside, adding both new 'ctx->user_optval + len > ctx->user_optval_end' and dynptr usage pattern together is counter productive considering dynptr is to avoid the length comparison. Saving an unnecessary "copy_from_user(ctx.optval, optval,...)" is more important than being able to directly read from ctx->user_optval. The bpf prog is usually only interested in a few optnames and directly returns without even looking at the optval for the uninterested optnames. The current __cgroup_bpf_run_filter_{get,set}sockopt always does a "copy_from_user(ctx.optval, optval,...)".

And then, if we have one non-sleepable program in the chain, we can
fallback everything to the kernel buffer (maybe).
This way seems like we can support both versions in the same chain and
have a more sane api?

Basically, you are saying to move cp_from_optval() and cp_to_optval() in
the testcase to kfuncs. This can cause unnecessary copy. We can add
an API to make a dynptr from the ctx to avoid unnecessary copies.

Yeah, handle this transparently in the kfunc or via dynptr, whatever works.
I'm not too worried about the extra copy tbh, this is a slow path; I'm
more concerned about improving the bpf program / user experience.

+1. It will be great if all can be done in two kfunc (/dynptr_{write,read}). I would disallow sleepable prog to use the optval if it can make things simpler. If it goes with dynptr, need to support bpf_dynptr_slice() as well which I think should be doable after a quick thought.

The test needs to include a cgrp->effective array that has interleaved sleepable and non-sleepable bpf progs.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux