Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 24 Apr 2024 17:48:08 -0700

On Wed, Apr 24, 2024 at 1:09 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:
> >
> >
> >
> > On 4/22/24 19:45, Kui-Feng Lee wrote:
> > >
> > >
> > > On 4/18/24 07:53, Alexei Starovoitov wrote:
> > >> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@xxxxxxxxx>
> > >> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 4/17/24 22:11, Alexei Starovoitov wrote:
> > >>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@xxxxxxxxx>
> > >>>> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
> > >>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
> > >>>>>> <thinker.li@xxxxxxxxx> wrote:
> > >>>>>>>
> > >>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> > >>>>>>> global variables. This was due to these types being initialized and
> > >>>>>>> verified in a special manner in the kernel. This patchset allows BPF
> > >>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
> > >>>>>>> bpf_list_head in
> > >>>>>>> the global namespace.
> > >>>>>>>
> > >>>>>>> The main change is to add "nelems" to btf_fields. The value of
> > >>>>>>> "nelems" represents the number of elements in the array if a
> > >>>>>>> btf_field
> > >>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
> > >>>>>>> verifies these types based on the information provided by the
> > >>>>>>> btf_field.
> > >>>>>>>
> > >>>>>>> The value of "size" will be the size of the entire array if a
> > >>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
> > >>>>>>> size of an element. The value of "offset" will be the offset of the
> > >>>>>>> beginning for an array. By putting this together, we can
> > >>>>>>> determine the
> > >>>>>>> offset of each element in an array. For example,
> > >>>>>>>
> > >>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
> > >>>>>>
> > >>>>>> Looks like this patch set enables arrays only.
> > >>>>>> Meaning the following is supported already:
> > >>>>>>
> > >>>>>> +private(C) struct bpf_spin_lock glock_c;
> > >>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> > >>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> > >>>>>>
> > >>>>>> while this support is added:
> > >>>>>>
> > >>>>>> +private(C) struct bpf_spin_lock glock_c;
> > >>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
> > >>>>>> node2);
> > >>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
> > >>>>>> node2);
> > >>>>>>
> > >>>>>> Am I right?
> > >>>>>>
> > >>>>>> What about the case when bpf_list_head is wrapped in a struct?
> > >>>>>> private(C) struct foo {
> > >>>>>>      struct bpf_list_head ghead;
> > >>>>>> } ghead;
> > >>>>>>
> > >>>>>> that's not enabled in this patch. I think.
> > >>>>>>
> > >>>>>> And the following:
> > >>>>>> private(C) struct foo {
> > >>>>>>      struct bpf_list_head ghead;
> > >>>>>> } ghead[2];
> > >>>>>>
> > >>>>>>
> > >>>>>> or
> > >>>>>>
> > >>>>>> private(C) struct foo {
> > >>>>>>      struct bpf_list_head ghead[2];
> > >>>>>> } ghead;
> > >>>>>>
> > >>>>>> Won't work either.
> > >>>>>
> > >>>>> No, they don't work.
> > >>>>> We had a discussion about this in the other day.
> > >>>>> I proposed to have another patch set to work on struct types.
> > >>>>> Do you prefer to handle it in this patch set?
> > >>>>>
> > >>>>>>
> > >>>>>> I think eventually we want to support all such combinations and
> > >>>>>> the approach proposed in this patch with 'nelems'
> > >>>>>> won't work for wrapper structs.
> > >>>>>>
> > >>>>>> I think it's better to unroll/flatten all structs and arrays
> > >>>>>> and represent them as individual elements in the flattened
> > >>>>>> structure. Then there will be no need to special case array with
> > >>>>>> 'nelems'.
> > >>>>>> All special BTF types will be individual elements with unique offset.
> > >>>>>>
> > >>>>>> Does this make sense?
> > >>>>>
> > >>>>> That means it will creates 10 btf_field(s) for an array having 10
> > >>>>> elements. The purpose of adding "nelems" is to avoid the
> > >>>>> repetition. Do
> > >>>>> you prefer to expand them?
> > >>>>
> > >>>> It's not just expansion, but a common way to handle nested structs too.
> > >>>>
> > >>>> I suspect by delaying nested into another patchset this approach
> > >>>> will become useless.
> > >>>>
> > >>>> So try adding nested structs in all combinations as a follow up and
> > >>>> I suspect you're realize that "nelems" approach doesn't really help.
> > >>>> You'd need to flatten them all.
> > >>>> And once you do there is no need for "nelems".
> > >>>
> > >>> For me, "nelems" is more like a choice of avoiding repetition of
> > >>> information, not a necessary. Before adding "nelems", I had considered
> > >>> to expand them as well. But, eventually, I chose to add "nelems".
> > >>>
> > >>> Since you think this repetition is not a problem, I will expand array as
> > >>> individual elements.
> > >>
> > >> You don't sound convinced :)
> > >> Please add support for nested structs on top of your "nelems" approach
> > >> and prototype the same without "nelems" and let's compare the two.
> > >
> > >
> > > The following is the prototype that flatten arrays and struct types.
> > > This approach is definitely simpler than "nelems" one.  However,
> > > it will repeat same information as many times as the size of an array.
> > > For now, we have a limitation on the number of btf_fields (<= 10).
>
> I understand the concern and desire to minimize duplication,
> but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
> From btf_parse_fields() pov it becomes one giant opaque field
> that sort_r() processes as a blob.
>
> How
> btf_record_find(reg->map_ptr->record,
>                 off + reg->var_off.value, BPF_KPTR);
>
> is going to find anything in there?
> Are you making a restriction that arrays and nested structs
> will only have kptrs in there ?
> So BPF_REPEAT_FIELDS can only wrap kptrs ?
> But even then these kptrs might have different btf_ids.
> So
> struct map_value {
>    struct {
>       struct task __kptr *p1;
>       struct thread __kptr *p2;
>    } arr[10];
> };
>
> won't be able to be represented as BPF_REPEAT_FIELDS?
>
> I think that simple flattening without repeat/nelems optimization
> is much easier to reason about.

+100 to this, BPF_REPEAT_FIELDS just will add an extra layer of
cognitive overload. Even if it can handle all conceivable situations,
let's just have a list of all "unique fields". We already do dynamic
memory allocation for struct btf_record, one more or less doesn't
matter all that much. We seem to be doing this once per map, not per
instruction or per state.

Let's keep it simple.

> BTF_FIELDS_MAX is just a constant.
> Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.