Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 5 Jan 2021 11:03:55 -0800

On Mon, Jan 04, 2021 at 09:08:21PM -0800, Andrii Nakryiko wrote:
> On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote:
> > > +
> > > +/* shuffled layout for relocatable (CO-RE) reads */
> > > +struct callback_head___shuffled {
> > > +     void (*func)(struct callback_head___shuffled *head);
> > > +     struct callback_head___shuffled *next;
> > > +};
> > > +
> > > +struct callback_head k_probe_in = {};
> > > +struct callback_head___shuffled k_core_in = {};
> > > +
> > > +struct callback_head *u_probe_in = 0;
> > > +struct callback_head___shuffled *u_core_in = 0;
> > > +
> > > +long k_probe_out = 0;
> > > +long u_probe_out = 0;
> > > +
> > > +long k_core_out = 0;
> > > +long u_core_out = 0;
> > > +
> > > +int my_pid = 0;
> > > +
> > > +SEC("raw_tracepoint/sys_enter")
> > > +int handler(void *ctx)
> > > +{
> > > +     int pid = bpf_get_current_pid_tgid() >> 32;
> > > +
> > > +     if (my_pid != pid)
> > > +             return 0;
> > > +
> > > +     /* next pointers for kernel address space have to be initialized from
> > > +      * BPF side, user-space mmaped addresses are stil user-space addresses
> > > +      */
> > > +     k_probe_in.next = &k_probe_in;
> > > +     __builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
> > > +
> > > +     k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
> > > +     k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
> > > +     u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
> > > +     u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);
> >
> > I don't understand what the test suppose to demonstrate.
> > co-re relocs work for kernel btf only.
> > Are you saying that 'struct callback_head' happened to be used by user space
> > process that allocated it in user memory. And that is the same struct as
> > being used by the kernel? So co-re relocs that apply against the kernel
> > will sort-of work against the data of user space process because
> > the user space is using the same struct? That sounds convoluted.
> 
> The test itself just tests that bpf_probe_read_user() is executed, not
> bpf_probe_read_kernel(). But yes, the use case is to read kernel data
> structures from the user memory address space. See [0] for the last
> time this was requested and justifications. It's not the first time
> someone asked about the user-space variant of BPF_CORE_READ(), though
> I won't be able to find the reference at this time.
> 
>   [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@xxxxxxxxxxxxxx/

That's quite confusing thread.

> > I struggle to see the point of patch 1:
> > +#define bpf_core_read_user(dst, sz, src)                                   \
> > +       bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
> >
> > co-re for user structs? Aren't they uapi? No reloc is needed.
> 
> The use case in [0] above is for reading UAPI structs, passed as input
> arguments to syscall. It's a pretty niche use case, but there are at
> least two more-or-less valid benefits to use CO-RE with "stable" UAPI
> structs:
> 
>   - handling 32-bit vs 64-bit UAPI structs uniformly;

what do you mean?
32-bit user space running on 64-bit kernel works through 'compat' syscalls.
If bpf progs are accessing 64-bit uapi structs in such case they're broken
and no amount of co-re can help.

>   - handling UAPI fields that were added in later kernels, but are
> missing on the earlier ones.
> 
> For the former, you'd need to compile two variants of the BPF program
> (or do convoluted and inconvenient 32-bit UAPI struct re-definition
> for 64-bit BPF target). 

No. 32-bit uapi structs should be used by bpf prog.
compat stuff is not only casting pointers from 64-bit to 32.

> For the latter... I guess you can do if/else
> dance based on the kernel version. Which sucks and is inconvenient
> (and kernel version checks are discouraged, it's more reliable to
> detect availability of specific types and fields).

Not really. ifdef based on kernel version is not needed.
bpf_core_field_exists() will work just fine.
No need to bpf_probe_read_user() macros.

> So all in all, while pretty rare and niche, seemed like a valid use
> case. And easy to support while reusing all the macro logic almost
> without any changes.

I think these new macros added with confusing and unclear goals
will do more harm than good.