On Sat, May 02, 2020 at 01:48:51AM +0200, Daniel Borkmann wrote: > On 4/27/20 11:45 AM, Lorenz Bauer wrote: > > On Sun, 26 Apr 2020 at 18:33, Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > [...] > > > > +/* Linux packet pointers are either aligned to NET_IP_ALIGN (aka 2 bytes), > > > > + * or not aligned if the arch supports efficient unaligned access. > > > > + * > > > > + * Since the verifier ensures that eBPF packet accesses follow these rules, > > > > + * we can tell LLVM to emit code as if we always had a larger alignment. > > > > + * It will yell at us if we end up on a platform where this is not valid. > > > > + */ > > > > +typedef uint8_t *net_ptr __attribute__((align_value(8))); > > > > > > Wow. I didn't know about this attribute. > > > I wonder whether it can help Daniel's memcpy hack. > > > > Yes, I think so. > > Just for some more context [0]. I think the problem is a bit more complex in > general. Generally, _any_ kind of pointer to some data (except for the stack) > is currently treated as byte-by-byte copy from __builtin_memcpy() and other > similarly available __builtin_*() helpers on BPF backend since the backend > cannot make any assumptions about the data's alignment and whether unaligned > access from the underlying arch is ok & efficient (the latter the verifier > does judge for us however). So it's definitely not just limited to xdp->data. > There is also the issue that while access to any non-stack data can be > unaligned, access to the stack however cannot. I've discussed a while back > with Yonghong about potential solutions. One would be to add a small patch > to the BPF backend to enable __builtin_*() helpers to allow for unaligned > access which could then be opt-ed in e.g. via -mattr from llc for the case > when we know that the compiled program only runs on archs with efficient > unaligned access anyway. However, this still potentially breaks with the BPF > stack for the case when objects are, for example, larger than size 8 but with > a natural alignment smaller than 8 where __builtin_memcpy() would then decide > to emit dw-typed load/stores. But for these cases could then be annotated via > __aligned(8) on stack. So this is basically what we do right now as a generic > workaround in Cilium [0], meaning, our own memcpy/memset with optimal number > of instructions and __aligned(8) where needed; most of the time this __aligned(8) > is not needed, so it's really just a few places, and we also have a cocci > scripts to catch these during development if needed. Anyway, real thing would > be to allow the BPF stack for unaligned access as well and then BPF backend > could nicely solve this in a native way w/o any workarounds, but that is tbd. > > Thanks, > Daniel > > [0] https://github.com/cilium/cilium/blob/master/bpf/include/bpf/builtins.h Daniel, do you mind adding such memcpy to libbpf ?