Re: [RFC bpf-next 1/5] bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 1, 2023 at 10:31 AM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:
>
>
>
> On 7/31/23 16:35, Stanislav Fomichev wrote:
> > On Mon, Jul 31, 2023 at 3:02 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:
> >>
> >> Sorry for the late reply! I just backed from a vacation.
> >>
> >>
> >> On 7/24/23 11:36, Stanislav Fomichev wrote:
> >>> On 07/21, kuifeng@xxxxxxxx wrote:
> >>>> From: Kui-Feng Lee <kuifeng@xxxxxxxx>
> >>>>
> >>>> Enable sleepable cgroup/{get,set}sockopt hooks.
> >>>>
> >>>> The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may
> >>>> received a pointer to the optval in user space instead of a kernel
> >>>> copy. ctx->user_optval and ctx->user_optval_end are the pointers to the
> >>>> begin and end of the user space buffer if receiving a user space
> >>>> buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving
> >>>> a kernel space buffer.
> >>>>
> >>>> A program receives a user space buffer if ctx->flags &
> >>>> BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space
> >>>> buffer.  The BPF programs should not read/write from/to a user space buffer
> >>>> dirrectly.  It should access the buffer through bpf_copy_from_user() and
> >>>> bpf_copy_to_user() provided in the following patches.
> >>>>
> >>>> Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxxxx>
> >>>> ---
> >>>>    include/linux/filter.h         |   3 +
> >>>>    include/uapi/linux/bpf.h       |   9 ++
> >>>>    kernel/bpf/cgroup.c            | 189 ++++++++++++++++++++++++++-------
> >>>>    kernel/bpf/verifier.c          |   7 +-
> >>>>    tools/include/uapi/linux/bpf.h |   9 ++
> >>>>    tools/lib/bpf/libbpf.c         |   2 +
> >>>>    6 files changed, 176 insertions(+), 43 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h
> >>>> index f69114083ec7..301dd1ba0de1 100644
> >>>> --- a/include/linux/filter.h
> >>>> +++ b/include/linux/filter.h
> >>>> @@ -1345,6 +1345,9 @@ struct bpf_sockopt_kern {
> >>>>       s32             level;
> >>>>       s32             optname;
> >>>>       s32             optlen;
> >>>> +    u32             flags;
> >>>> +    u8              *user_optval;
> >>>> +    u8              *user_optval_end;
> >>>>       /* for retval in struct bpf_cg_run_ctx */
> >>>>       struct task_struct *current_task;
> >>>>       /* Temporary "register" for indirect stores to ppos. */
> >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>>> index 739c15906a65..b2f81193f97b 100644
> >>>> --- a/include/uapi/linux/bpf.h
> >>>> +++ b/include/uapi/linux/bpf.h
> >>>> @@ -7135,6 +7135,15 @@ struct bpf_sockopt {
> >>>>       __s32   optname;
> >>>>       __s32   optlen;
> >>>>       __s32   retval;
> >>>> +
> >>>> +    __bpf_md_ptr(void *, user_optval);
> >>>> +    __bpf_md_ptr(void *, user_optval_end);
> >>>
> >>> Can we re-purpose existing optval/optval_end pointers
> >>> for the sleepable programs? IOW, when the prog is sleepable,
> >>> pass user pointers via optval/optval_end and require the programs
> >>> to do copy_to/from on this buffer (even if the backing pointer might be
> >>> in kernel memory - we can handle that in the kfuncs?).
> >>>
> >>> The fact that the program now needs to look at the flag
> >>> (BPF_SOCKOPT_FLAG_OPTVAL_USER) and decide which buffer to
> >>> use makes the handling even more complicated; and we already have a
> >>> bunch of hairy stuff in these hooks. (or I misreading the change?)
> >>>
> >>> Also, regarding sleepable and non-sleepable co-existence: do we really need
> >>> that? Can we say that all the programs have to be sleepable
> >>> or non-sleepable? Mixing them complicates the sharing of that buffer.
> >>
> >> I considered this approach as well. This is an open question for me.
> >> If we go this way, it means we can not attach a BPF program of a type
> >> if any program of the other type has been installed.
> >
> > If we pass two pointers (kernel copy buffer + real user mem) to the
> > sleepable program, we'll make it even more complicated by inheriting
> > all existing warts of the non-sleepable version :-(
> > IOW, feels like we should try to see if we can have some
> > copy_to/from_user kfuncs in the sleepable version that transparently
> > support either kernel or user memory (and prohibit direct access to
> > user_optval in the sleepable version).
> > And then, if we have one non-sleepable program in the chain, we can
> > fallback everything to the kernel buffer (maybe).
> > This way seems like we can support both versions in the same chain and
> > have a more sane api?
>
> Basically, you are saying to move cp_from_optval() and cp_to_optval() in
> the testcase to kfuncs. This can cause unnecessary copy. We can add
> an API to make a dynptr from the ctx to avoid unnecessary copies.

Yeah, handle this transparently in the kfunc or via dynptr, whatever works.
I'm not too worried about the extra copy tbh, this is a slow path; I'm
more concerned about improving the bpf program / user experience.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux