Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 20 Oct 2021 22:20:24 -0400

On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> >> were both considered before[1].
> >
> > Ah sorry, I thought it was just a perl one that had been the
> > show-stopper. I hadn't noticed the awk one. However, the point of my
> > patch was to use perl if available, and fall back otherwise. Maybe
> > that's too ugly, but it does address the concern with Eric's
> > implementation.
>
> I think carrying two implementations is worse than just having the one
> slightly slower one.

I have no opinion on whether or not assuming that awk or Perl exists and
can be relied upon during the build is reasonable or not. It seems like
the former might be a slightly safer assumption than the latter, but in
all honesty it seems like both are always likely to be around.

In any case, I think the point was that we could improve upon Peff's
patch by just having a single implementation done in awk. And when I
wrote that I definitely was in the mindset of being able to rely on awk
during compilation.

> >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
> >> with/without this awk version the difference in runtime is within the
> >> error bars. I.e. making the loop faster isn't necessary. It's better to
> >> get to a point where make can save you from doing all/most of the work
> >> by checking modification times, rather than making an O(n) loop faster.
> >
> > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
> > partial results is IMHO inferior to just making things faster.
>
> I agree with you in the general case, but for something that's consumed
> by a make dependency graph I find it easier to debug things if
> e.g. changing git-add.txt results in a change to git-add.gen, which is
> then cat'd together.
>
> IOW if we had a sufficiently fast C compiler I think I'd still prefer
> make's existing rules over some equivalent of:
>
>     cat *.c | super-fast-cc
>
> Since similar to how the *.sp files depend on the the *.o files now,
> declaring the dependency graph allows you to easily add more built
> things.

This seems like an unfair comparison to me. I might be more sympathetic
if we were generating a more complicated artifact by running
generate-cmdlist.sh, but its inputs and outputs seem very well defined
(and non-complicated) to me.

In any case, I agree with Peff that this isn't the approach that I would
have taken. But I also think that *just* parallelizing isn't necessarily
a win here. There are two reasons I think that:

  - The cognitive load required to parallelize this process is
    complicated; the .build directory seems like another thing to keep
    track of, and it's not clear to me what updates it, or what the
    result of touching some file in that directory is.

  - But even if the parallelization was achievable by more
    straightforward means, you still have to do the slow thing when
    you're rebuilding from scratch. So this is strictly worse the first
    time you are compiling, at least on machines with fewer cores.

In any case, this is all overkill in my mind for what we are talking
about. I agree that 'cat *.c | super-fast-cc' is worse than a competent
Makefile that knows what to build and when. But the problem here is a
slow loop in shell that is easily made much faster by implementing it
in a language that can execute the whole loop in a single process.

Thanks,
Taylor