On Thu, May 9, 2024 at 4:09 PM Quentin Monnet <qmo@xxxxxxxxxx> wrote: > > On 09/05/2024 22:39, Andrii Nakryiko wrote: > > On Thu, May 9, 2024 at 8:17 AM Mykyta Yatsenko > > <mykyta.yatsenko5@xxxxxxxxx> wrote: > >> > >> From: Mykyta Yatsenko <yatsenko@xxxxxxxx> > >> > >> Sort bpftool c dump output; aiming to simplify vmlinux.h diffing and > >> forcing more natural type definitions ordering. > >> > >> Definitions are sorted first by their BTF kind ranks, then by their base > >> type name and by their own name. > >> > >> Type ranks > >> > >> Assign ranks to btf kinds (defined in function btf_type_rank) to set > >> next order: > >> 1. Anonymous enums/enums64 > >> 2. Named enums/enums64 > >> 3. Trivial types typedefs (ints, then floats) > >> 4. Structs/Unions > >> 5. Function prototypes > >> 6. Forward declarations > >> > >> Type rank is set to maximum for unnamed reference types, structs and > >> unions to avoid emitting those types early. They will be emitted as > >> part of the type chain starting with named type. > >> > >> Lexicographical ordering > >> > >> Each type is assigned a sort_name and own_name. > >> sort_name is the resolved name of the final base type for reference > >> types (typedef, pointer, array etc). Sorting by sort_name allows to > >> group typedefs of the same base type. sort_name for non-reference type > >> is the same as own_name. own_name is a direct name of particular type, > >> is used as final sorting step. > >> > >> Signed-off-by: Mykyta Yatsenko <yatsenko@xxxxxxxx> > >> --- > >> tools/bpf/bpftool/btf.c | 125 +++++++++++++++++++++++++++++++++++++++- > >> 1 file changed, 122 insertions(+), 3 deletions(-) > >> > > > > It's getting very close, see a bunch of nits below. > > Agreed, it's in a nice shape, thanks a lot for this work. Apologies for > the review delay, I just have a few additional nits. > > > > >> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c > >> index 91fcb75babe3..09ecd2abf066 100644 > >> --- a/tools/bpf/bpftool/btf.c > >> +++ b/tools/bpf/bpftool/btf.c > >> @@ -43,6 +43,13 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = { > >> [BTF_KIND_ENUM64] = "ENUM64", > >> }; > >> > >> +struct sort_datum { > >> + int index; > >> + int type_rank; > >> + const char *sort_name; > >> + const char *own_name; > >> +}; > >> + > >> static const char *btf_int_enc_str(__u8 encoding) > >> { > >> switch (encoding) { > >> @@ -460,11 +467,114 @@ static void __printf(2, 0) btf_dump_printf(void *ctx, > >> vfprintf(stdout, fmt, args); > >> } > >> > >> +static bool is_reference_type(const struct btf_type *t) > >> +{ > >> + int kind = btf_kind(t); > >> + > >> + return kind == BTF_KIND_CONST || kind == BTF_KIND_PTR || kind == BTF_KIND_VOLATILE || > >> + kind == BTF_KIND_RESTRICT || kind == BTF_KIND_ARRAY || kind == BTF_KIND_TYPEDEF || > >> + kind == BTF_KIND_DECL_TAG; > > > > probably best to write as a switch, also make sure that > > BTF_KIND_TYPE_TAG is supported, it is effectively treated as > > CONST/VOLATILE > > > > and actually looking below, I'd just incorporate these as extra cases > > in the existing btf_type_rank() switch, and then have a similar > > open-coded switch for btf_type_sort_name() > > > > When dealing with BTF I find these explicit switches listing what > > kinds are special and how they are processed much easier to check and > > follow than any of the extra helpers doing some kind checks > > > > > >> +} > >> + > >> +static int btf_type_rank(const struct btf *btf, __u32 index, bool has_name) > >> +{ > >> + const struct btf_type *t = btf__type_by_id(btf, index); > >> + const int max_rank = 10; > >> + const int kind = btf_kind(t); > >> + > >> + if (t->name_off) > >> + has_name = true; > >> + > >> + switch (kind) { > >> + case BTF_KIND_ENUM: > >> + case BTF_KIND_ENUM64: > >> + return has_name ? 1 : 0; > >> + case BTF_KIND_INT: > >> + case BTF_KIND_FLOAT: > >> + return 2; > >> + case BTF_KIND_STRUCT: > >> + case BTF_KIND_UNION: > >> + return has_name ? 3 : max_rank; > >> + case BTF_KIND_FUNC_PROTO: > >> + return has_name ? 4 : max_rank; > >> + > >> + default: { > >> + if (has_name && is_reference_type(t)) { > >> + const int parent = kind == BTF_KIND_ARRAY ? btf_array(t)->type : t->type; > >> + > >> + return btf_type_rank(btf, parent, has_name); > >> + } > >> + return max_rank; > >> + } > > > > nit: you don't need these {} for default > > > >> + } > >> +} > >> + > >> +static const char *btf_type_sort_name(const struct btf *btf, __u32 index) > >> +{ > >> + const struct btf_type *t = btf__type_by_id(btf, index); > >> + int kind = btf_kind(t); > >> + > >> + /* Use name of the first element for anonymous enums */ > >> + if (!t->name_off && (kind == BTF_KIND_ENUM || kind == BTF_KIND_ENUM64) && > > > > there is btf_is_any_enum() helper for this kind check > > > >> + BTF_INFO_VLEN(t->info)) > > > > please use btf_vlen(t) helper > > > >> + return btf__name_by_offset(btf, btf_enum(t)->name_off); > > > > what if enum's vlen == 0? I think I mentioned that before, it > > shouldn't happen in valid BTF, but it's technically allowable in BTF, > > so best to be able to handle that instead of crashing or doing random > > memory reads. > > > >> + > >> + /* Return base type name for reference types */ > >> + while (is_reference_type(t)) { > >> + index = btf_kind(t) == BTF_KIND_ARRAY ? btf_array(t)->type : t->type; > >> + t = btf__type_by_id(btf, index); > >> + } > >> + > >> + return btf__name_by_offset(btf, t->name_off); > >> +} > >> + > >> +static int btf_type_compare(const void *left, const void *right) > >> +{ > >> + const struct sort_datum *datum1 = (const struct sort_datum *)left; > >> + const struct sort_datum *datum2 = (const struct sort_datum *)right; > >> + int sort_name_cmp; > > > > stylistic nit: it's minot, but I'd use less distracting naming. Eg., d1, d2, r. > > > >> + > >> + if (datum1->type_rank != datum2->type_rank) > >> + return datum1->type_rank < datum2->type_rank ? -1 : 1; > >> + > >> + sort_name_cmp = strcmp(datum1->sort_name, datum2->sort_name); > >> + if (sort_name_cmp) > >> + return sort_name_cmp; > >> + > >> + return strcmp(datum1->own_name, datum2->own_name); > >> +} > >> + > >> +static struct sort_datum *sort_btf_c(const struct btf *btf) > >> +{ > >> + int total_root_types; > >> + struct sort_datum *datums; > >> + > >> + total_root_types = btf__type_cnt(btf); > > > > nit: s/total_root_types/n/ > > > >> + datums = malloc(sizeof(struct sort_datum) * total_root_types); > >> + if (!datums) > >> + return NULL; > >> + > >> + for (int i = 0; i < total_root_types; ++i) { > >> + struct sort_datum *current_datum = datums + i; > > > > nit: just d for the name, it's not going to be hard to follow or ambiguous > > > >> + const struct btf_type *t = btf__type_by_id(btf, i); > >> + > >> + current_datum->index = i; > >> + current_datum->type_rank = btf_type_rank(btf, i, false); > >> + current_datum->sort_name = btf_type_sort_name(btf, i); > >> + current_datum->own_name = btf__name_by_offset(btf, t->name_off); > >> + } > >> + > >> + qsort(datums, total_root_types, sizeof(struct sort_datum), btf_type_compare); > >> + > >> + return datums; > >> +} > >> + > >> static int dump_btf_c(const struct btf *btf, > >> - __u32 *root_type_ids, int root_type_cnt) > >> + __u32 *root_type_ids, int root_type_cnt, bool sort_dump) > >> { > >> struct btf_dump *d; > >> int err = 0, i; > >> + struct sort_datum *datums = NULL; > > Nit: Most variables in the file are declared in "reverse-Christmas-tree" > order (longest lines first, unless there's a reason not to). Could you > please try to preserve this order, here and elsewhere, for consistency? > > >> > >> d = btf_dump__new(btf, btf_dump_printf, NULL, NULL); > >> if (!d) > >> @@ -486,8 +596,12 @@ static int dump_btf_c(const struct btf *btf, > >> } else { > >> int cnt = btf__type_cnt(btf); > >> > >> + if (sort_dump) > >> + datums = sort_btf_c(btf); > >> for (i = 1; i < cnt; i++) { > >> - err = btf_dump__dump_type(d, i); > >> + int idx = datums ? datums[i].index : i; > >> + > >> + err = btf_dump__dump_type(d, idx); > >> if (err) > >> goto done; > >> } > >> @@ -501,6 +615,7 @@ static int dump_btf_c(const struct btf *btf, > >> > >> done: > >> btf_dump__free(d); > >> + free(datums); > > Small nit: I'd swap the two lines above, it would seem more logical to > free in the reverse order from allocation and would be more > straightforward to "split" if we ever need to free d only when jumping > from the first goto. > > >> return err; > >> } > >> > >> @@ -553,6 +668,7 @@ static int do_dump(int argc, char **argv) > >> __u32 root_type_ids[2]; > >> int root_type_cnt = 0; > >> bool dump_c = false; > >> + bool sort_dump_c = true; > >> __u32 btf_id = -1; > >> const char *src; > >> int fd = -1; > >> @@ -663,6 +779,9 @@ static int do_dump(int argc, char **argv) > >> goto done; > >> } > >> NEXT_ARG(); > >> + } else if (is_prefix(*argv, "unordered")) { > > > > it's more of a "original order" rather than unordered, so maybe "unnormalized"? > I'd have gone with "unsorted", but Andrii's reasoning probably applies unsorted also makes sense, let's go with that, it's a single word > the same to it. I find "unnormalized" might be difficult to understand, > maybe "preserve_order" or, shorter, "keep_order"? > > And as Alan mentioned on the other thread, we'll need the following > updates for the new keyword: > > - Adding the keyword to the command summary at the top of > tools/bpf/bpftool/Documentation/bpftool-btf.rst: > | *FORMAT* := { **raw** | **c** [**unordered**]} > (or whatever keyword we pick) > - Adding the description for the keyword, below on the same page > - Adding the keyword to the help message, at the end of btf.c > - Updating the bash completion. The patch below should work (to adjust > with the final keyword): > > ------ > diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool > index 04afe2ac2228..85a43c867e5f 100644 > --- a/tools/bpf/bpftool/bash-completion/bpftool > +++ b/tools/bpf/bpftool/bash-completion/bpftool > @@ -930,6 +930,9 @@ _bpftool() > format) > COMPREPLY=( $( compgen -W "c raw" -- "$cur" ) ) > ;; > + c) > + COMPREPLY=( $( compgen -W "unordered" -- "$cur" ) ) > + ;; > *) > # emit extra options > case ${words[3]} in > ------