On 7/31/23 16:35, Stanislav Fomichev wrote:
On Mon, Jul 31, 2023 at 3:02 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:
Sorry for the late reply! I just backed from a vacation.
On 7/24/23 11:36, Stanislav Fomichev wrote:
On 07/21, kuifeng@xxxxxxxx wrote:
From: Kui-Feng Lee <kuifeng@xxxxxxxx>
Enable sleepable cgroup/{get,set}sockopt hooks.
The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may
received a pointer to the optval in user space instead of a kernel
copy. ctx->user_optval and ctx->user_optval_end are the pointers to the
begin and end of the user space buffer if receiving a user space
buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving
a kernel space buffer.
A program receives a user space buffer if ctx->flags &
BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space
buffer. The BPF programs should not read/write from/to a user space buffer
dirrectly. It should access the buffer through bpf_copy_from_user() and
bpf_copy_to_user() provided in the following patches.
Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxxxx>
---
include/linux/filter.h | 3 +
include/uapi/linux/bpf.h | 9 ++
kernel/bpf/cgroup.c | 189 ++++++++++++++++++++++++++-------
kernel/bpf/verifier.c | 7 +-
tools/include/uapi/linux/bpf.h | 9 ++
tools/lib/bpf/libbpf.c | 2 +
6 files changed, 176 insertions(+), 43 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index f69114083ec7..301dd1ba0de1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1345,6 +1345,9 @@ struct bpf_sockopt_kern {
s32 level;
s32 optname;
s32 optlen;
+ u32 flags;
+ u8 *user_optval;
+ u8 *user_optval_end;
/* for retval in struct bpf_cg_run_ctx */
struct task_struct *current_task;
/* Temporary "register" for indirect stores to ppos. */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 739c15906a65..b2f81193f97b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7135,6 +7135,15 @@ struct bpf_sockopt {
__s32 optname;
__s32 optlen;
__s32 retval;
+
+ __bpf_md_ptr(void *, user_optval);
+ __bpf_md_ptr(void *, user_optval_end);
Can we re-purpose existing optval/optval_end pointers
for the sleepable programs? IOW, when the prog is sleepable,
pass user pointers via optval/optval_end and require the programs
to do copy_to/from on this buffer (even if the backing pointer might be
in kernel memory - we can handle that in the kfuncs?).
The fact that the program now needs to look at the flag
(BPF_SOCKOPT_FLAG_OPTVAL_USER) and decide which buffer to
use makes the handling even more complicated; and we already have a
bunch of hairy stuff in these hooks. (or I misreading the change?)
Also, regarding sleepable and non-sleepable co-existence: do we really need
that? Can we say that all the programs have to be sleepable
or non-sleepable? Mixing them complicates the sharing of that buffer.
I considered this approach as well. This is an open question for me.
If we go this way, it means we can not attach a BPF program of a type
if any program of the other type has been installed.
If we pass two pointers (kernel copy buffer + real user mem) to the
sleepable program, we'll make it even more complicated by inheriting
all existing warts of the non-sleepable version :-(
IOW, feels like we should try to see if we can have some
copy_to/from_user kfuncs in the sleepable version that transparently
support either kernel or user memory (and prohibit direct access to
user_optval in the sleepable version).
And then, if we have one non-sleepable program in the chain, we can
fallback everything to the kernel buffer (maybe).
This way seems like we can support both versions in the same chain and
have a more sane api?
Basically, you are saying to move cp_from_optval() and cp_to_optval() in
the testcase to kfuncs. This can cause unnecessary copy. We can add
an API to make a dynptr from the ctx to avoid unnecessary copies.