Re: [PATCH bpf-next v5 03/25] bpf: Support bpf_list_head in map values

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 9 Nov 2022 15:11:14 -0800

On Tue, Nov 8, 2022 at 5:03 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Nov 8, 2022 at 4:23 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > >  struct bpf_offload_dev;
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 94659f6b3395..dd381086bad9 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -6887,6 +6887,16 @@ struct bpf_dynptr {
> > > > >         __u64 :64;
> > > > >  } __attribute__((aligned(8)));
> > > > >
> > > > > +struct bpf_list_head {
> > > > > +       __u64 :64;
> > > > > +       __u64 :64;
> > > > > +} __attribute__((aligned(8)));
> > > > > +
> > > > > +struct bpf_list_node {
> > > > > +       __u64 :64;
> > > > > +       __u64 :64;
> > > > > +} __attribute__((aligned(8)));
> > > >
> > > > Dave mentioned that this `__u64 :64` trick makes vmlinux.h lose the
> > > > alignment information, as the struct itself is empty, and so there is
> > > > nothing indicating that it has to be 8-byte aligned.
>
> Since it's not a new issue let's fix it for all.
> Whether it's a combination of pahole + bpftool or just pahole.

So yeah, I was expecting if we do this, we'd do this for all opaque
BPF UAPI structs like this (bpf_spin_lock and others), of course.

>
> > > >
> > > > So what if we have
> > > >
> > > > struct bpf_list_node {
> > > >     __u64 __opaque[2];
> > > > } __attribute__((aligned(8)));
> > > >
> > > > ?
> > > >
> > >
> > > Yep, can do that. Note that it's also potentially an issue for existing cases,
> > > like bpf_spin_lock, bpf_timer, bpf_dynptr, etc. Not completely sure if changing
> > > things now is possible, but if it is, we should probably make it for all of
> > > them?
> >
> > Why not? We are not removing anything or changing sizes, so it's
> > backwards compatible.
> > But I have a suspicion Alexei might not like
> > this __opaque field, so let's see what he says.
>
> I prefer to fix them all at once without adding a name.
>

This is not an issue with BTF per se.

struct blah {
  u64: 64
};

is just an empty struct blah with 8-byte size. Both BTF and DWARF will
record it as just

struct blah {
}

and record it's size as 8 bytes.

With that, there is nothing to suggest that this struct has to have
8-byte alignment.

If we mark explicitly __attribute__((aligned(8))) then DWARF will
additionally record alignment=8 for such struct. BTF doesn't record
alignment, though.

adding u64 fields internally will make libbpf recognize that struct
needs at least 8-byte alignment, which is what I propose as a simple
solution.

Alternatives are:
 - extend BTF to record struct/union alignments in BTF_KIND_{STRUCT,UNION}
 - record __attribute__((aligned(8))) as a new KIND (BTF_KIND_ATTRIBUTE)

Both seem like a bit of an overkill, given the work around is to have
u64 __opaque[] fields, which we won't have to remove or rename ever
(because those fields are not used).

> >
> > > > >                 off = vsi->offset;
> > > > > +               if (i && !off)
> > > > > +                       return -EFAULT;
> > > >
> > > > similarly, I'd say that either we'd need to calculate the exact
> > > > expected offset, or just not do anything here?
> > > >
> > >
> > > This thread is actually what prompted this check:
> > > https://lore.kernel.org/bpf/CAEf4Bza7ga2hEQ4J7EtgRHz49p1vZtaT4d2RDiyGOKGK41Nt=Q@xxxxxxxxxxxxxx
> > >
> > > Unless loaded using libbpf all offsets are zero. So I think we need to reject it
> > > here, but I think the same zero sized field might be an issue for this as well,
> > > so maybe we remember the last field size and check whether it was zero or not?
> > >
> > > I'll also include some more tests for these cases.
> >
> > The question is whether this affects correctness from the verifier
> > standpoint? If it does, there must be some other place where this will
> > cause problem and should be caught and reported.
>
> If it's an issue with BTF then we should probably check it
> during generic datasec verification.
> Here it's kinda late to warn.

+1 and do it more properly than forcing it to be non-zero

>
> > My main objection is that it's half a check, we check that it's
> > non-zero, but we don't check that it is correct in stricter sense. So
> > I'd rather drop it altogether, or go all the way to check that it is
> > correct offset (taking into account sizes of previous vars).