Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 21 Oct 2021 01:14:37 +0200

On Wed, Oct 20 2021, Taylor Blau wrote:

> On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
>> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > This series is based off an off-hand comment I made about making the
>> > cmdlist.sh faster, in the meantime much of the same methods are
>> > already cooking in "next" for the "lint-docs" target.
>> >
>> > See 7/8 for the main performance numbers, along the way I stole some
>> > patches from Johannes Sixt who'd worked on optimizing the script
>> > before, which compliment this new method of generating this file by
>> > piggy-backing more on GNU make for managing a dependency graph for us.
>>
>> I still think this is a much more complicated and error-prone approach
>> than just making the script faster. I know we can't rely on perl, but
>> could we use it optimistically?

Jeff: Just in terms of error prone both of these implementations will
accept bad input that's being caught in 8/8 of this series.

We accept a lot of bad input now, ending up with some combinations of
bad output or compile errors if you screw with the input *.txt files. I
think I've addressed all of those in this series.

If you mean the general concept of making a "foo.gen" from a "foo.txt"
as an intermediate with make as a way to get to "many-foo.h" I don't
really see how it's error prone conceptually. You get error checking
each step of the way, and it encourages logic that's simpler each step
of the way.

> I'll take credit for this terrible idea of using Perl when available.
>
> But I don't think we even need to, since we could just rely on Awk. That
> has all the benefits you described while still avoiding the circular
> dependency on libgit.a. But the killer feature is that we don't have to
> rely on two implementations, the lesser-used of which is likely to
> bitrot over time.
>
> The resulting awk is a little ugly, because of the nested-ness. I'm also
> no awk-spert, but I think that something like the below gets the job
> done.
>
> It also has the benefit of being slightly faster than the equivalent
> Perl implementation, for whatever those extra ~9 ms are worth ;).
>
> Benchmark #1: sh generate-cmdlist.sh command-list.txt
>   Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
>   Range (min … max):    15.5 ms …  31.7 ms    95 runs
>
> Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
>   Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
>   Range (min … max):    22.4 ms …  54.8 ms    64 runs
>
> Summary
>   'sh generate-cmdlist.sh command-list.txt' ran
>     1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>
> ---
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..39338ef1cc 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -64,12 +64,19 @@ print_command_list () {
>  	echo "static struct cmdname_help command_list[] = {"
>
>  	command_list "$1" |
> -	while read cmd rest
> -	do
> -		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> -		echo " },"
> -	done
> +	awk '{
> +		f="Documentation/" $1 ".txt"
> +		while((getline line<f) > 0) {
> +			if (match(line, "^" $1 " - ")) {
> +				syn=substr(line, RLENGTH+1)
> +				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
> +				for (i=2; i<=NF; i++) {
> +					printf " | CAT_%s", $i
> +				}
> +				print " },"
> +			}
> +		}
> +	}'
>  	echo "};"
>  }

Per Eric's Sunshine's upthread comments an awk and Perl implementation
were both considered before[1].

I also care a bit about the timings of the from-scratch build, but I
think they're way less interesting than a partial build.

I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
with/without this awk version the difference in runtime is within the
error bars. I.e. making the loop faster isn't necessary. It's better to
get to a point where make can save you from doing all/most of the work
by checking modification times, rather than making an O(n) loop faster.

The only reason there's even a loop there is because it's used by the
cmake logic in contrib/* (how we've ended up with a hard dependency in
contrib is another matter...).

I'm also interested in (and have WIP patches for) simplifying things
more generally in the Makefile. Once we have a file exploded out has
just the synopsis line that can be used to replace what's now in
Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
"man git".

There's subtle dependency issues there as well, and just having a
one-off solution for the the command-list.h doesn't get us closer to
addressing that sibling implementation.

In terms of future Makefile work I was hoping to get this in, untangle
some of the complexity between the inter-dependency of Makefile &
Documentation/Makefile (eventually just merging the two, and leaving a
stub in Documentation/Makefile). I've also got a working implementation
for getting rid of all of the "FORCE" dependencies (except the version
one).

1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@xxxxxxxxxxxxxx/