Re: [PATCH ipsec-next v1 6/7] bpf: selftests: test_tunnel: Disable CO-RE relocations

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 28 Nov 2023 08:02:30 -0800

On Mon, Nov 27, 2023 at 8:06 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 11/27/23 7:01 PM, Daniel Xu wrote:
> > On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote:
> >> On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote:
> >>> On 11/27/23 12:44 AM, Yonghong Song wrote:
> >>>> On 11/26/23 8:52 PM, Eduard Zingerman wrote:
> >>>>> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote:
> >>>>> [...]
> >>>>>>> Tbh I'm not sure. This test passes with preserve_static_offset
> >>>>>>> because it suppresses preserve_access_index. In general clang
> >>>>>>> translates bitfield access to a set of IR statements like:
> >>>>>>>
> >>>>>>>     C:
> >>>>>>>       struct foo {
> >>>>>>>         unsigned _;
> >>>>>>>         unsigned a:1;
> >>>>>>>         ...
> >>>>>>>       };
> >>>>>>>       ... foo->a ...
> >>>>>>>
> >>>>>>>     IR:
> >>>>>>>       %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1
> >>>>>>>       %bf.load = load i8, ptr %a, align 4
> >>>>>>>       %bf.clear = and i8 %bf.load, 1
> >>>>>>>       %bf.cast = zext i8 %bf.clear to i32
> >>>>>>>
> >>>>>>> With preserve_static_offset the getelementptr+load are replaced by a
> >>>>>>> single statement which is preserved as-is till code generation,
> >>>>>>> thus load with align 4 is preserved.
> >>>>>>>
> >>>>>>> On the other hand, I'm not sure that clang guarantees that load or
> >>>>>>> stores used for bitfield access would be always aligned according to
> >>>>>>> verifier expectations.
> >>>>>>>
> >>>>>>> I think we should check if there are some clang knobs that prevent
> >>>>>>> generation of unaligned memory access. I'll take a look.
> >>>>>> Is there a reason to prefer fixing in compiler? I'm not opposed to it,
> >>>>>> but the downside to compiler fix is it takes years to propagate and
> >>>>>> sprinkles ifdefs into the code.
> >>>>>>
> >>>>>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()?
> >>>>> Well, the contraption below passes verification, tunnel selftest
> >>>>> appears to work. I might have messed up some shifts in the macro,
> >>>>> though.
> >>>> I didn't test it. But from high level it should work.
> >>>>
> >>>>> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular
> >>>>> field access might be unaligned.
> >>>> clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet
> >>>> alignment requirement. This is also required for BPF_CORE_READ_BITFIELD.
> >>>>
> >>>>> ---
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> >>>>> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> >>>>> index 3065a716544d..41cd913ac7ff 100644
> >>>>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> >>>>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> >>>>> @@ -9,6 +9,7 @@
> >>>>>    #include "vmlinux.h"
> >>>>>    #include <bpf/bpf_helpers.h>
> >>>>>    #include <bpf/bpf_endian.h>
> >>>>> +#include <bpf/bpf_core_read.h>
> >>>>>    #include "bpf_kfuncs.h"
> >>>>>    #include "bpf_tracing_net.h"
> >>>>>    @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb)
> >>>>>        return TC_ACT_OK;
> >>>>>    }
> >>>>>    +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({            \
> >>>>> +    void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);    \
> >>>>> +    unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE);        \
> >>>>> +    unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \
> >>>>> +    unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \
> >>>>> +    unsigned bit_size = (rshift - lshift);                \
> >>>>> +    unsigned long long nval, val, hi, lo;                \
> >>>>> +                                    \
> >>>>> +    asm volatile("" : "=r"(p) : "0"(p));                \
> >>>> Use asm volatile("" : "+r"(p)) ?
> >>>>
> >>>>> +                                    \
> >>>>> +    switch (byte_size) {                        \
> >>>>> +    case 1: val = *(unsigned char *)p; break;            \
> >>>>> +    case 2: val = *(unsigned short *)p; break;            \
> >>>>> +    case 4: val = *(unsigned int *)p; break;            \
> >>>>> +    case 8: val = *(unsigned long long *)p; break;            \
> >>>>> +    }                                \
> >>>>> +    hi = val >> (bit_size + rshift);                \
> >>>>> +    hi <<= bit_size + rshift;                    \
> >>>>> +    lo = val << (bit_size + lshift);                \
> >>>>> +    lo >>= bit_size + lshift;                    \
> >>>>> +    nval = new_val;                            \
> >>>>> +    nval <<= lshift;                        \
> >>>>> +    nval >>= rshift;                        \
> >>>>> +    val = hi | nval | lo;                        \
> >>>>> +    switch (byte_size) {                        \
> >>>>> +    case 1: *(unsigned char *)p      = val; break;            \
> >>>>> +    case 2: *(unsigned short *)p     = val; break;            \
> >>>>> +    case 4: *(unsigned int *)p       = val; break;            \
> >>>>> +    case 8: *(unsigned long long *)p = val; break;            \
> >>>>> +    }                                \
> >>>>> +})
> >>>> I think this should be put in libbpf public header files but not sure
> >>>> where to put it. bpf_core_read.h although it is core write?
> >>>>
> >>>> But on the other hand, this is a uapi struct bitfield write,
> >>>> strictly speaking, CORE write is really unnecessary here. It
> >>>> would be great if we can relieve users from dealing with
> >>>> such unnecessary CORE writes. In that sense, for this particular
> >>>> case, I would prefer rewriting the code by using byte-level
> >>>> stores...
> >>> or preserve_static_offset to clearly mean to undo bitfield CORE ...
> >> Ok, I will do byte-level rewrite for next revision.
> > [...]
> >
> > This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 .
> >
> > But I don't think it's very pretty. Also I'm seeing on the internet that
> > people are saying the exact layout of bitfields is compiler dependent.
>
> Any reference for this (exact layout of bitfields is compiler dependent)?
>
> > So I am wondering if these byte sized writes are correct. For that
> > matter, I am wondering how the GCC generated bitfield accesses line up
> > with clang generated BPF bytecode. Or why uapi contains a bitfield.
>
> One thing for sure is memory layout of bitfields should be the same
> for both clang and gcc as it is determined by C standard. Register
> representation and how to manipulate could be different for different
> compilers.
>
> >
> > WDYT, should I send up v2 with this or should I do one of the other
> > approaches in this thread?
>
> Daniel, look at your patch, since we need to do CORE_READ for
> those bitfields any way, I think Eduard's patch with
> BPF_CORE_WRITE_BITFIELD does make sense and it also makes code
> easy to understand. Could you take Eduard's patch for now?
> Whether and where to put BPF_CORE_WRITE_BITFIELD macros
> can be decided later.

bpf_core_read.h name is... let's say "historical" and was never meant
to limit stuff there to read-only or anything like that. Think about
it as just bpf_core.h where all the CO-RE-related stuff goes. So
please put BPF_CORE_WRITE_BITFIELD there.

>
> >
> > I am ok with any of the approaches.
> >
> > Thanks,
> > Daniel
> >