Re: [PATCH 2/2] kallsyms: build faster by using .incbin

Jann Horn <jannh@xxxxxxxxxx> · Thu, 22 Feb 2024 15:20:29 +0100

On Thu, Feb 22, 2024 at 12:20 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> On Thu, Feb 22, 2024 at 5:07 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> > On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > >
> > > Currently, kallsyms builds a big assembly file (~19M with a normal
> > > kernel config), and then the assembler has to turn that big assembly
> > > file back into binary data, which takes around a second per kallsyms
> > > invocation. (Normally there are two kallsyms invocations per build.)
> > >
> > > It is much faster to instead directly output binary data, which can
> > > be imported in an assembly file using ".incbin". This is also the
> > > approach taken by arch/x86/boot/compressed/mkpiggy.c.
> >
> >
> > Yes, that is a sensible case because it just wraps the binary
> > without any modification.
> >
> >
> >
> >
> > > So this patch switches kallsyms to that approach.
> > >
> > > A complication with this is that the endianness of numbers between
> > > host and target might not match (for example, when cross-compiling);
> > > and there seems to be no kconfig symbol that tells us what endianness
> > > the target has.
> >
> >
> >
> > CONFIG_CPU_BIG_ENDIAN is it.
> >
> >
> >
> > You could do this:
> >
> > if is_enabled CONFIG_CPU_BIG_ENDIAN; then
> >         kallsymopt="${kallsymopt} --big-endian"
> > fi
> >
> > if is_enabled CONFIG_64BIT; then
> >         kallsymopt="${kallsymopt} --64bit"
> > fi
>
> Aah, nice, thanks, I searched for endianness kconfig flags but somehow
> missed that one.
>
> Though actually, I think further optimizations might make it necessary
> to directly operate on ELF files anyway, in which case it would
> probably be easier to keep using the ELF header...
>
> > > So pass the path to the intermediate vmlinux ELF file to the kallsyms
> > > tool, and let it parse the ELF header to figure out the target's
> > > endianness.
> > >
> > > I have verified that running kallsyms without these changes and
> > > kallsyms with these changes on the same input System.map results
> > > in identical object files.
> > >
> > > This change reduces the time for an incremental kernel rebuild
> > > (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
> > > over 16 runs each) on my machine - saving around 3.6 seconds.
> >
> >
> >
> >
> > This reverts bea5b74504742f1b51b815bcaf9a70bddbc49ce3
> >
> > Somebody might struggle with debugging again, but I am not sure.
> >
> > Arnd?
> >
> >
> >
> > If the effort were "I invented a way to do kallsyms in
> > one pass instead of three", it would be so much more attractive.
>
> Actually, I was chatting with someone about this yesterday, and I
> think I have an idea on how to get rid of two link steps... I might
> try out some stuff and then come back with another version of this
> series afterwards.

I think basically we could change kallsyms so that on the second run,
it checks if the kallsyms layout is the same as on the first run, and
if yes, directly overwrite the relevant part of vmlinux. (And adjust
the relative_base.) That would save us the final link... does that
sound like a reasonable idea?

I don't really have any good ideas for saving more than that, given
that we want to squeeze the kallsyms in between the data and bss
sections, so we can't just append it at the end of vmlinux... we could
get the symbol list from vmlinux.o instead of linking
".tmp_vmlinux.kallsyms1", but the comments in link-vmlinux.sh say that
extra linker-generated symbols might appear, and I guess we probably
don't want to miss those...