Re: [RFC bpf-next 1/5] bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7/31/23 16:35, Stanislav Fomichev wrote:
On Mon, Jul 31, 2023 at 3:02 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:

Sorry for the late reply! I just backed from a vacation.


On 7/24/23 11:36, Stanislav Fomichev wrote:
On 07/21, kuifeng@xxxxxxxx wrote:
From: Kui-Feng Lee <kuifeng@xxxxxxxx>

Enable sleepable cgroup/{get,set}sockopt hooks.

The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may
received a pointer to the optval in user space instead of a kernel
copy. ctx->user_optval and ctx->user_optval_end are the pointers to the
begin and end of the user space buffer if receiving a user space
buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving
a kernel space buffer.

A program receives a user space buffer if ctx->flags &
BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space
buffer.  The BPF programs should not read/write from/to a user space buffer
dirrectly.  It should access the buffer through bpf_copy_from_user() and
bpf_copy_to_user() provided in the following patches.

Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxxxx>
---
   include/linux/filter.h         |   3 +
   include/uapi/linux/bpf.h       |   9 ++
   kernel/bpf/cgroup.c            | 189 ++++++++++++++++++++++++++-------
   kernel/bpf/verifier.c          |   7 +-
   tools/include/uapi/linux/bpf.h |   9 ++
   tools/lib/bpf/libbpf.c         |   2 +
   6 files changed, 176 insertions(+), 43 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f69114083ec7..301dd1ba0de1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1345,6 +1345,9 @@ struct bpf_sockopt_kern {
      s32             level;
      s32             optname;
      s32             optlen;
+    u32             flags;
+    u8              *user_optval;
+    u8              *user_optval_end;
      /* for retval in struct bpf_cg_run_ctx */
      struct task_struct *current_task;
      /* Temporary "register" for indirect stores to ppos. */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 739c15906a65..b2f81193f97b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7135,6 +7135,15 @@ struct bpf_sockopt {
      __s32   optname;
      __s32   optlen;
      __s32   retval;
+
+    __bpf_md_ptr(void *, user_optval);
+    __bpf_md_ptr(void *, user_optval_end);

Can we re-purpose existing optval/optval_end pointers
for the sleepable programs? IOW, when the prog is sleepable,
pass user pointers via optval/optval_end and require the programs
to do copy_to/from on this buffer (even if the backing pointer might be
in kernel memory - we can handle that in the kfuncs?).

The fact that the program now needs to look at the flag
(BPF_SOCKOPT_FLAG_OPTVAL_USER) and decide which buffer to
use makes the handling even more complicated; and we already have a
bunch of hairy stuff in these hooks. (or I misreading the change?)

Also, regarding sleepable and non-sleepable co-existence: do we really need
that? Can we say that all the programs have to be sleepable
or non-sleepable? Mixing them complicates the sharing of that buffer.

I considered this approach as well. This is an open question for me.
If we go this way, it means we can not attach a BPF program of a type
if any program of the other type has been installed.

If we pass two pointers (kernel copy buffer + real user mem) to the
sleepable program, we'll make it even more complicated by inheriting
all existing warts of the non-sleepable version :-(
IOW, feels like we should try to see if we can have some
copy_to/from_user kfuncs in the sleepable version that transparently
support either kernel or user memory (and prohibit direct access to
user_optval in the sleepable version).
And then, if we have one non-sleepable program in the chain, we can
fallback everything to the kernel buffer (maybe).
This way seems like we can support both versions in the same chain and
have a more sane api?

Basically, you are saying to move cp_from_optval() and cp_to_optval() in
the testcase to kfuncs. This can cause unnecessary copy. We can add
an API to make a dynptr from the ctx to avoid unnecessary copies.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux