Re: vmlinux.h overlap/conflict with network protocol definitions

Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx> · Fri, 5 Apr 2024 12:37:59 -0700

On Fri, 5 Apr 2024 10:31:46 -0700
Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:

> On Thu, Apr 4, 2024 at 5:53 PM Stephen Hemminger
> <stephen@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, 4 Apr 2024 14:45:04 -0700
> > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >  
> > > > > This is a known issue as currently vmlinux.h does not support macros.
> > > > > There are some efforts by Edward Zingerman to support this but this has
> > > > > not done yet. At the same time, you could have a trivial header file
> > > > > like
> > > > > https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
> > > > > to be used for bpf program and then your bpf program with vmlinux.h can
> > > > > have much easier CORE support.  
> > > >
> > > >
> > > > That is an example of header surgery which I would rather avoid having to carry
> > > > as long term technical debt baggage.
> > > >  
> > >
> > > What's your ultimate goal? As Yonghong said, vmlinux.h is not
> > > compatible with other headers. So you have to pick either using
> > > vmlinux.h as a base + adding missing #define's (because those are not
> > > recorded in types, so can't be put into vmlinux.h), or not use
> > > vmlinux.h, use linux UAPI/internal headers and then use explicit CO-RE
> > > helpers/attributes to make your application CO-RE-relocatable.
> > >
> > > It's not clear from your original email why exactly you wanted to
> > > switch to vmlinux.h in the first place.  
> >
> > Some backstory. There is not an existing TC filter for this, so the
> > original developer had the idea of using BPF to do it.
> >
> > The program is a small BPF program to implement a TC filter that looks at
> > SKB and does mapping to queue based on L3 (or L3/L4) header. So not heavily dependent
> > on kernel data structure, but sk_buff is not necessarily stable; actual layout
> > depends on kernel config.
> >  
> 
> If it's only a few fields from sk_buff that you need, you can define
> your own minimal sk_buff definition with
> __attribute__((preserve_access_index)) added to it. You don't have to
> declare fields in the right order, just make sure that field types
> match. E.g., something like:
> 
> struct sk_buff {
>   unsigned char *head;
>   unsigned char *data;
>   struct sock *sock;
> } __attribute__((preserve_access_index));
> 
> You can call it `struct sk_buff___mine` if you already include
> sk_buff, to avoid the conflict. Libbpf will still understand that it
> should match it to struct sk_buff in the kernel.
> 
> Alternatively, you can just use BPF_CORE_READ() macro on types you get
> from kernel headers, even if they don't have that
> preserve_access_index attribute.
> 
> Or, you can just use __builtin_preserve_access_index(&skb->len) to
> access sk_buff's fields in CO-RE-relocatable way.
> 
> Or, you can have entire block within which all fields accesses will be
> CO-RE-relocatable:
> 
> int len;
> 
> __builtin_preserve_access_index(({
>     len = skb->len;
>     /* other skb accesses here as well */
> }));
> 
> Many ways to have bolted on CO-RE even without vmlinux.h. vmlinux.h is
> convenient (apart from lack of #defines, but that's an orthogonal
> problem), but by no means required for BPF CO-RE.
> 

The skbuff pointer is passed to the TC program. So BPF_CORE_READ doesn't
make sense here.  The biggest concern is that kernel config of the build
machine may not match the kernel config of the eventual target, and some
of the fields of the sk_buff are hidden based on #ifdef and may change.