Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 5 Jan 2021 13:03:47 -0800

On Tue, Jan 5, 2021 at 11:04 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Jan 04, 2021 at 09:08:21PM -0800, Andrii Nakryiko wrote:
> > On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote:
> > > > +
> > > > +/* shuffled layout for relocatable (CO-RE) reads */
> > > > +struct callback_head___shuffled {
> > > > +     void (*func)(struct callback_head___shuffled *head);
> > > > +     struct callback_head___shuffled *next;
> > > > +};
> > > > +
> > > > +struct callback_head k_probe_in = {};
> > > > +struct callback_head___shuffled k_core_in = {};
> > > > +
> > > > +struct callback_head *u_probe_in = 0;
> > > > +struct callback_head___shuffled *u_core_in = 0;
> > > > +
> > > > +long k_probe_out = 0;
> > > > +long u_probe_out = 0;
> > > > +
> > > > +long k_core_out = 0;
> > > > +long u_core_out = 0;
> > > > +
> > > > +int my_pid = 0;
> > > > +
> > > > +SEC("raw_tracepoint/sys_enter")
> > > > +int handler(void *ctx)
> > > > +{
> > > > +     int pid = bpf_get_current_pid_tgid() >> 32;
> > > > +
> > > > +     if (my_pid != pid)
> > > > +             return 0;
> > > > +
> > > > +     /* next pointers for kernel address space have to be initialized from
> > > > +      * BPF side, user-space mmaped addresses are stil user-space addresses
> > > > +      */
> > > > +     k_probe_in.next = &k_probe_in;
> > > > +     __builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
> > > > +
> > > > +     k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
> > > > +     k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
> > > > +     u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
> > > > +     u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);
> > >
> > > I don't understand what the test suppose to demonstrate.
> > > co-re relocs work for kernel btf only.
> > > Are you saying that 'struct callback_head' happened to be used by user space
> > > process that allocated it in user memory. And that is the same struct as
> > > being used by the kernel? So co-re relocs that apply against the kernel
> > > will sort-of work against the data of user space process because
> > > the user space is using the same struct? That sounds convoluted.
> >
> > The test itself just tests that bpf_probe_read_user() is executed, not
> > bpf_probe_read_kernel(). But yes, the use case is to read kernel data
> > structures from the user memory address space. See [0] for the last
> > time this was requested and justifications. It's not the first time
> > someone asked about the user-space variant of BPF_CORE_READ(), though
> > I won't be able to find the reference at this time.
> >
> >   [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@xxxxxxxxxxxxxx/
>
> That's quite confusing thread.
>
> > > I struggle to see the point of patch 1:
> > > +#define bpf_core_read_user(dst, sz, src)                                   \
> > > +       bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
> > >
> > > co-re for user structs? Aren't they uapi? No reloc is needed.
> >
> > The use case in [0] above is for reading UAPI structs, passed as input
> > arguments to syscall. It's a pretty niche use case, but there are at
> > least two more-or-less valid benefits to use CO-RE with "stable" UAPI
> > structs:
> >
> >   - handling 32-bit vs 64-bit UAPI structs uniformly;
>
> what do you mean?
> 32-bit user space running on 64-bit kernel works through 'compat' syscalls.
> If bpf progs are accessing 64-bit uapi structs in such case they're broken
> and no amount of co-re can help.

I know nothing about compat, so can't comment on that. But the way I
understood the situation was the BPF program compiled once (well, at
least from the unmodified source code), but runs on ARM32 and (on a
separate physical host) on ARM64. And it's task is, say, to read UAPI
kernel structures from syscall arguments.

>
> >   - handling UAPI fields that were added in later kernels, but are
> > missing on the earlier ones.
> >
> > For the former, you'd need to compile two variants of the BPF program
> > (or do convoluted and inconvenient 32-bit UAPI struct re-definition
> > for 64-bit BPF target).
>
> No. 32-bit uapi structs should be used by bpf prog.
> compat stuff is not only casting pointers from 64-bit to 32.
>

See above about compat, that's not what I was thinking about.

One simple example I found in UAPI definitions is struct timespec, it
seems it's defined with `long`:

struct timespec {
        __kernel_old_time_t     tv_sec;         /* seconds */
        long                    tv_nsec;        /* nanoseconds */
}

So if you were to trace clock_gettime(), you'd need to deal with
differently-sized reads of tv_nsec, depending on whether you are
running on the 32-bit or 64-bit host.

There are probably other examples where UAPI structs use long instead
of __u32 or __u64, but I didn't dig too deep.

> > For the latter... I guess you can do if/else
> > dance based on the kernel version. Which sucks and is inconvenient
> > (and kernel version checks are discouraged, it's more reliable to
> > detect availability of specific types and fields).
>
> Not really. ifdef based on kernel version is not needed.
> bpf_core_field_exists() will work just fine.
> No need to bpf_probe_read_user() macros.

Yes, you are right, detection of field/type existence doesn't depend
on kernel- vs user-space, disregard this one.

>
> > So all in all, while pretty rare and niche, seemed like a valid use
> > case. And easy to support while reusing all the macro logic almost
> > without any changes.
>
> I think these new macros added with confusing and unclear goals
> will do more harm than good.

Let's see if Gilad can provide his perspective. I have no strong
feelings about this and can send a patch removing CORE_READ_USER
variants (they didn't make it into libbpf v0.3, so no harm or API
stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are
still useful for reading non-relocatable, but nested data structures,
so I'd prefer to keep those.