Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 24 Apr 2024 13:09:31 -0700

On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:
>
>
>
> On 4/22/24 19:45, Kui-Feng Lee wrote:
> >
> >
> > On 4/18/24 07:53, Alexei Starovoitov wrote:
> >> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@xxxxxxxxx>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 4/17/24 22:11, Alexei Starovoitov wrote:
> >>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@xxxxxxxxx>
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
> >>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
> >>>>>> <thinker.li@xxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> >>>>>>> global variables. This was due to these types being initialized and
> >>>>>>> verified in a special manner in the kernel. This patchset allows BPF
> >>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
> >>>>>>> bpf_list_head in
> >>>>>>> the global namespace.
> >>>>>>>
> >>>>>>> The main change is to add "nelems" to btf_fields. The value of
> >>>>>>> "nelems" represents the number of elements in the array if a
> >>>>>>> btf_field
> >>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
> >>>>>>> verifies these types based on the information provided by the
> >>>>>>> btf_field.
> >>>>>>>
> >>>>>>> The value of "size" will be the size of the entire array if a
> >>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
> >>>>>>> size of an element. The value of "offset" will be the offset of the
> >>>>>>> beginning for an array. By putting this together, we can
> >>>>>>> determine the
> >>>>>>> offset of each element in an array. For example,
> >>>>>>>
> >>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
> >>>>>>
> >>>>>> Looks like this patch set enables arrays only.
> >>>>>> Meaning the following is supported already:
> >>>>>>
> >>>>>> +private(C) struct bpf_spin_lock glock_c;
> >>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> >>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> >>>>>>
> >>>>>> while this support is added:
> >>>>>>
> >>>>>> +private(C) struct bpf_spin_lock glock_c;
> >>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
> >>>>>> node2);
> >>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
> >>>>>> node2);
> >>>>>>
> >>>>>> Am I right?
> >>>>>>
> >>>>>> What about the case when bpf_list_head is wrapped in a struct?
> >>>>>> private(C) struct foo {
> >>>>>>      struct bpf_list_head ghead;
> >>>>>> } ghead;
> >>>>>>
> >>>>>> that's not enabled in this patch. I think.
> >>>>>>
> >>>>>> And the following:
> >>>>>> private(C) struct foo {
> >>>>>>      struct bpf_list_head ghead;
> >>>>>> } ghead[2];
> >>>>>>
> >>>>>>
> >>>>>> or
> >>>>>>
> >>>>>> private(C) struct foo {
> >>>>>>      struct bpf_list_head ghead[2];
> >>>>>> } ghead;
> >>>>>>
> >>>>>> Won't work either.
> >>>>>
> >>>>> No, they don't work.
> >>>>> We had a discussion about this in the other day.
> >>>>> I proposed to have another patch set to work on struct types.
> >>>>> Do you prefer to handle it in this patch set?
> >>>>>
> >>>>>>
> >>>>>> I think eventually we want to support all such combinations and
> >>>>>> the approach proposed in this patch with 'nelems'
> >>>>>> won't work for wrapper structs.
> >>>>>>
> >>>>>> I think it's better to unroll/flatten all structs and arrays
> >>>>>> and represent them as individual elements in the flattened
> >>>>>> structure. Then there will be no need to special case array with
> >>>>>> 'nelems'.
> >>>>>> All special BTF types will be individual elements with unique offset.
> >>>>>>
> >>>>>> Does this make sense?
> >>>>>
> >>>>> That means it will creates 10 btf_field(s) for an array having 10
> >>>>> elements. The purpose of adding "nelems" is to avoid the
> >>>>> repetition. Do
> >>>>> you prefer to expand them?
> >>>>
> >>>> It's not just expansion, but a common way to handle nested structs too.
> >>>>
> >>>> I suspect by delaying nested into another patchset this approach
> >>>> will become useless.
> >>>>
> >>>> So try adding nested structs in all combinations as a follow up and
> >>>> I suspect you're realize that "nelems" approach doesn't really help.
> >>>> You'd need to flatten them all.
> >>>> And once you do there is no need for "nelems".
> >>>
> >>> For me, "nelems" is more like a choice of avoiding repetition of
> >>> information, not a necessary. Before adding "nelems", I had considered
> >>> to expand them as well. But, eventually, I chose to add "nelems".
> >>>
> >>> Since you think this repetition is not a problem, I will expand array as
> >>> individual elements.
> >>
> >> You don't sound convinced :)
> >> Please add support for nested structs on top of your "nelems" approach
> >> and prototype the same without "nelems" and let's compare the two.
> >
> >
> > The following is the prototype that flatten arrays and struct types.
> > This approach is definitely simpler than "nelems" one.  However,
> > it will repeat same information as many times as the size of an array.
> > For now, we have a limitation on the number of btf_fields (<= 10).

I understand the concern and desire to minimize duplication,
but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
>From btf_parse_fields() pov it becomes one giant opaque field
that sort_r() processes as a blob.

How
btf_record_find(reg->map_ptr->record,
                off + reg->var_off.value, BPF_KPTR);

is going to find anything in there?
Are you making a restriction that arrays and nested structs
will only have kptrs in there ?
So BPF_REPEAT_FIELDS can only wrap kptrs ?
But even then these kptrs might have different btf_ids.
So
struct map_value {
   struct {
      struct task __kptr *p1;
      struct thread __kptr *p2;
   } arr[10];
};

won't be able to be represented as BPF_REPEAT_FIELDS?

I think that simple flattening without repeat/nelems optimization
is much easier to reason about.
BTF_FIELDS_MAX is just a constant.
Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.