Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding

Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> · Tue, 9 Apr 2024 16:57:17 -0300

On Tue, Apr 09, 2024 at 10:29:18PM +0300, Eduard Zingerman wrote:
> On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote:
> > > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote:
> > > [...]
> >  
> > > > I would actually go with sorted BTF, since it will probably
> > > > make diff-ing of BTFs practical. Will be easier to track changes
> > 
> > What kind of diff-ing of BTFs from different kernels are you interested
> > in?
> > 
> > in pahole's repository we have btfdiff, that will, given a vmlinux with
> > both DWARF and BTF use pahole to pretty print all types, expanded, and
> > then compare the two outputs, which should produce the same results from
> > BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF
> > file.
> > 
> > And also now we have another regression test script that will produce
> > the output from 'btftool btf dump' for the BTF generated from DWARF in
> > serial mode, and then compare that with the output from 'bpftool btf
> > dump' for reproducible encodings done using -j 1 ...
> > number-of-processors-on-the-machine. All have to match, all types, all
> > BTF ids.
> > 
> > We can as well use something like btfdiff to compare the output from
> > 'pahole --expand_types --sort' for two BTFs for two different kernels,
> > to see what are the new types and the changes to types in both.
> > 
> > What else do you want to compare? To be able to match we would have to
> > somehow have ranges for each DWARF CU so that when encoding and then
> > deduplicating we would have space in the ID space for new types to fill
> > in while keeping the old types IDs matching the same types in the new
> > vmlinux.
> 
> As far as I understand Alexei, he means diffing two vmlinux.h files
> generated for different kernel versions. The vmlinux.h is generated by
> bpftool using command `bpftool btf dump file <binary-file> format c`.
> The output is topologically sorted to satisfy C compiler, but ordering
> is not total, so vmlinux.h content may vary from build to build if BTF
> type order differs.
> 
> Thus, any kind of stable BTF type ordering would make vmlinux.h stable.
> On the other hand, topological ordering used by bpftool
> (the algorithm is in the libbpf, actually) might be extended with
> additional rules to make the ordering total.

Interesting, the other tool that is in the pahole repo is 'fullcircle',
that given a .o file will generate a compileable file (a vmlinux.h say)
and then build it again to generate DWARF and then compare the original
DWARF with the new onbe.

> > While ordering all types we would have to have ID space available from
> > each of the BTF kinds, no?
> > 
> > I haven't looked at Eduard's patches, is that what it is done?
> 
> No, I don't reserve any ID space, the output of 
> `bpftool btf dump file <binary-file> format raw` is not suitable for
> diffing w/o post-processing if some types are added or removed in the
> middle.

> I simply add a function to compare two BTF types and a pass that sorts
> all BTF types before finalizing BTF generation.

Ok, so I see that the BTF ids for the types will change, its the
vmlinux.h that is to be compared.

root@x1:~# pahole -F btf --compile | tail -12
struct ncsi_aen_handler {
	unsigned char              type;                 /*     0     1 */

	/* XXX 3 bytes hole, try to pack */

	int                        payload;              /*     4     4 */
	int                        (*handler)(struct ncsi_dev_priv *, struct ncsi_aen_pkt_hdr *); /*     8     8 */

	/* size: 16, cachelines: 1, members: 3 */
	/* sum members: 13, holes: 1, sum holes: 3 */
	/* last cacheline: 16 bytes */
};
root@x1:~# pahole -F btf --compile > a.c ; echo 'int main(void) { struct ncsi_aen_handler b = { 1, } ; return b.type ; } ' >> a.c ; gcc -g -o bla -c a.c
root@x1:~# pahole --expand_types ncsi_aen_handler > from_kernel_btf
root@x1:~# pahole --expand_types -C ncsi_aen_handler bla > from_bla_dwarf
root@x1:~# diff -u from_kernel_btf from_bla_dwarf
root@x1:~#

The above is for a super simple struct, no expansions even, now for:

root@x1:~# pahole -F btf --compile > a.c ; echo 'int main(void) { struct task_struct b = { .prio = 12345, } ; return b.prio ; } ' >> a.c ; gcc -g -o bla -c a.c
root@x1:~# pahole --suppress_aligned_attribute --expand_types -C task_struct bla > from_bla_dwarf
root@x1:~# pahole --suppress_aligned_attribute --expand_types task_struct > from_kernel_btf
root@x1:~# diff -u from_kernel_btf from_bla_dwarf
root@x1:~#

I suppressed the align attribute as right now the output from pahole
when it finds the __attribute__ alignment present in DWARF is slightly
different, but equivalent (barring bugs) to when it infers the alignment
and adds it to BTF data, that has no alignment info other than the
member offsets (DWARF has both the member offsets to infer the alignment
_and_ attributes when they are present in the source code, sometimes
even duplicated, which probably is the reason for the difference in
output (albeit the end result should be equivalent)).

root@x1:~# pahole --expand_types task_struct | wc -l
1254
root@x1:~# pahole --expand_types task_struct | tail
	/* XXX last struct has 1 hole, 1 bit hole */

	/* size: 13696, cachelines: 214, members: 265 */
	/* sum members: 13522, holes: 20, sum holes: 158 */
	/* sum bitfield members: 83 bits, bit holes: 2, sum bit holes: 45 bits */
	/* member types with holes: 4, total: 6, bit holes: 2, total: 2 */
	/* paddings: 6, sum paddings: 49 */
	/* forced alignments: 2, forced holes: 2, sum forced holes: 88 */
};

root@x1:~#

I.e. the original BTF doesn't have to be sorted (well, it will keep the
order DWARF does, which, in turn, is another desire of reproducible
builds, it will not have the same output for two kernel releases, but
should be as close as possible) pahole (--sort or --compile) or bpftool
can do it either by plain sorting the types (pahole --sort, used by
btfdiff to compara output from DWARF to output from BTF) or by
generating a compilable source code (pahole --compile, aka
"topologically sorted to satisfy C compiler").

> > > > from one kernel version to another. vmlinux.h will become
> > > > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h
> > > > will be possible.
> > > > Or am I misunderstanding the sorting concept?

> > > You understand the concept correctly, here is a sample:

> > >   [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL
> > >   [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
> > >   [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none)
> > >   [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> > >   [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> > >   [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
> > >   [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED

> > The above: so far so good, probably there will not be something that
> > will push what is now BTF id 6 to become 7 in a new vmlinux, but can we
> > say the same for the more dynamic parts, like the list of structs?

> > A struct can vanish, that abstraction not being used anymore in the
> > kernel, so its BTF id will vacate and all of the next struct IDs will
> > "fall down" and gets its IDs decremented, no?

> Yes, this would happen.

We're on the same page.

> > If these difficulties are present as I mentioned, then rebuilding from
> > the BTF data with something like the existing 'pahole --expand_types
> > --sort' from the BTF from kernel N to compare with the same output for
> > kernel N + 1 should be enough to see what changed from one kernel to the
> > next one?

> Yes, this is an option.

Agreed. What I tried in my series was to do as little as possible to
make the serial output be the same as whatever level of paralelism we
have while making the whole process to cost as close to the
unconstrained parallelism that we had in place, i.e. to get a
reproducible build at the lowest cost in terms of code churn (the more
code we touch, the more chances we have of new bugs to be introduced)
and of CPU cycles/memory use, etc.

- Arnaldo