Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"

Ingo Molnar <mingo@xxxxxxxxxx> · Sat, 8 Jan 2022 12:54:26 +0100

* Nathan Chancellor <nathan@xxxxxxxxxx> wrote:

> On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote:
> > > > With the fast-headers kernel that's down to ~36,000 lines of code, 
> > > > almost a factor of 3 reduction:
> > > > 
> > > >   # fast-headers-v1:
> > > >   kepler:~/mingo.tip.git> wc -l kernel/pid.i
> > > >   35941 kernel/pid.i
> > > 
> > > Coming from someone who often has to reduce a preprocessed kernel source 
> > > file with creduce/cvise to report compiler bugs, this will be a very 
> > > welcomed change, as those tools will have to do less work, and I can get 
> > > my reports done faster.
> > 
> > That's nice, didn't think of that side effect.
> > 
> > Could you perhaps measure this too, to see how much of a benefit it is?
> 
> As it turns out, I got an opportunity to measure this sooner rather than
> later [1]. Using cvise [2] with an identical set of toolchains and
> interestingness test [3], reducing net/core/skbuff.c took significantly
> less time with the version from the fast-headers tree.
> 
> v5.16-rc8:
> 
> $ wc -l skbuff.i
> 105135 skbuff.i
> 
> $ time cvise test.fish skbuff.i
> ...
> ________________________________________________________
> Executed in  114.02 mins    fish           external
>    usr time  1180.43 mins   69.29 millis  1180.43 mins
>    sys time  229.80 mins  248.11 millis  229.79 mins
> 
> fast-headers:
> 
> $ wc -l skbuff.i
> 78765 skbuff.i
> 
> $ time cvise test.fish skbuff.i
> ...
> ________________________________________________________
> Executed in   47.38 mins    fish           external
>    usr time  620.17 mins   32.78 millis  620.17 mins
>    sys time  123.70 mins  122.38 millis  123.70 mins
> 
> I was not expecting that much of a difference but it somewhat makes 
> sense, as the tool spends less time eliminated unused code and the 
> compiler invocations will be incrementally quicker as the input becomes 
> smaller.

Indeed, that's a +140% speedup in build performance, not bad. :-)

I also got around testing Clang (12) myself, and with my 'reference distro 
config' I got these results:

 #
 # v5.16-rc8
 #
 Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

 55,638,543,274,254      instructions              #    0.77  insn per cycle           ( +-  0.01% )
 72,074,911,968,393      cycles                    #    3.901 GHz                      ( +-  0.04% )
      18,490,451.51 msec cpu-clock                 #   54.740 CPUs utilized            ( +-  0.04% )

                 337.788 +- 0.834 seconds time elapsed  ( +-  0.25% )

 #
 # -fast-headers-v2-rc3
 #
 Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

 30,904,130,243,855      instructions              #    0.76  insn per cycle           ( +-  0.02% )
 40,703,482,733,690      cycles                    #    3.898 GHz                      ( +-  0.00% )
      10,443,670.86 msec cpu-clock                 #   58.093 CPUs utilized            ( +-  0.00% )

                 179.773 +- 0.829 seconds time elapsed  ( +-  0.46% )

That's a +88% build speedup on Clang - even better than the +78% speedup on 
GCC(-10).

Thanks,

	Ingo