Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.

Kui-Feng Lee <sinquersw@xxxxxxxxx> · Wed, 24 Apr 2024 15:32:34 -0700

On 4/24/24 13:09, Alexei Starovoitov wrote:
On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:

On 4/22/24 19:45, Kui-Feng Lee wrote:

On 4/18/24 07:53, Alexei Starovoitov wrote:
On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@xxxxxxxxx>
wrote:

On 4/17/24 22:11, Alexei Starovoitov wrote:
On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@xxxxxxxxx>
wrote:

On 4/17/24 20:30, Alexei Starovoitov wrote:
On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
<thinker.li@xxxxxxxxx> wrote:

The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
global variables. This was due to these types being initialized and
verified in a special manner in the kernel. This patchset allows BPF
programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head in
the global namespace.

The main change is to add "nelems" to btf_fields. The value of
"nelems" represents the number of elements in the array if a
btf_field
represents an array. Otherwise, "nelem" will be 1. The verifier
verifies these types based on the information provided by the
btf_field.

The value of "size" will be the size of the entire array if a
btf_field represents an array. Dividing "size" by "nelems" gives the
size of an element. The value of "offset" will be the offset of the
beginning for an array. By putting this together, we can
determine the
offset of each element in an array. For example,

        struct bpf_cpumask __kptr * global_mask_array[2];

Looks like this patch set enables arrays only.
Meaning the following is supported already:

+private(C) struct bpf_spin_lock glock_c;
+private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
+private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);

while this support is added:

+private(C) struct bpf_spin_lock glock_c;
+private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
node2);
+private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
node2);

Am I right?

What about the case when bpf_list_head is wrapped in a struct?
private(C) struct foo {
      struct bpf_list_head ghead;
} ghead;

that's not enabled in this patch. I think.

And the following:
private(C) struct foo {
      struct bpf_list_head ghead;
} ghead[2];

or

private(C) struct foo {
      struct bpf_list_head ghead[2];
} ghead;

Won't work either.

No, they don't work.
We had a discussion about this in the other day.
I proposed to have another patch set to work on struct types.
Do you prefer to handle it in this patch set?

I think eventually we want to support all such combinations and
the approach proposed in this patch with 'nelems'
won't work for wrapper structs.

I think it's better to unroll/flatten all structs and arrays
and represent them as individual elements in the flattened
structure. Then there will be no need to special case array with
'nelems'.
All special BTF types will be individual elements with unique offset.

Does this make sense?

That means it will creates 10 btf_field(s) for an array having 10
elements. The purpose of adding "nelems" is to avoid the
repetition. Do
you prefer to expand them?

It's not just expansion, but a common way to handle nested structs too.

I suspect by delaying nested into another patchset this approach
will become useless.

So try adding nested structs in all combinations as a follow up and
I suspect you're realize that "nelems" approach doesn't really help.
You'd need to flatten them all.
And once you do there is no need for "nelems".

For me, "nelems" is more like a choice of avoiding repetition of
information, not a necessary. Before adding "nelems", I had considered
to expand them as well. But, eventually, I chose to add "nelems".

Since you think this repetition is not a problem, I will expand array as
individual elements.

You don't sound convinced :)
Please add support for nested structs on top of your "nelems" approach
and prototype the same without "nelems" and let's compare the two.

The following is the prototype that flatten arrays and struct types.
This approach is definitely simpler than "nelems" one.  However,
it will repeat same information as many times as the size of an array.
For now, we have a limitation on the number of btf_fields (<= 10).

I understand the concern and desire to minimize duplication,
but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
 From btf_parse_fields() pov it becomes one giant opaque field
that sort_r() processes as a blob.

How
btf_record_find(reg->map_ptr->record,
                 off + reg->var_off.value, BPF_KPTR);

is going to find anything in there?
Are you making a restriction that arrays and nested structs
will only have kptrs in there ?
So BPF_REPEAT_FIELDS can only wrap kptrs ?
But even then these kptrs might have different btf_ids.
So
struct map_value {
    struct {
       struct task __kptr *p1;
       struct thread __kptr *p2;
    } arr[10];
};

won't be able to be represented as BPF_REPEAT_FIELDS?

BPF_REPEAT_FIELDS can handle it. With this case, bpf_parse_fields() will 
create a list of btf_fields like this:

    [ btf_field(type=BPF_KPTR_..., offset=0, ...),
      btf_field(type=BPF_KPTR_..., offset=8, ...),
      btf_field(type=BPF_REPEAT_FIELDS, offset=16, repeated_fields=2, 
nelems=9, size=16)]

You might miss the explanation in [1].

btf_record_find() is still doing binary search. Looking for p2 in
obj->arr[1], the offset will be 24.  btf_record_find() will find the
BPF_REPEATED_FIELDS one, and redirect the offset to

  (field->offset - field->size + (16 - field->offset) % field->size) == 8

Then, it will return the btf_field whose offset is 8.

[1] 
https://lore.kernel.org/all/4d3dc24f-fb50-4674-8eec-4c38e4d4b2c1@xxxxxxxxx/

I think that simple flattening without repeat/nelems optimization
is much easier to reason about.
BTF_FIELDS_MAX is just a constant.
Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.

I will switch to flatten one if you think "nelems" &
"BPF_REPEAT_FIELDS" are too complicated after reading the explanation in
[1].