Re: Why the Makefile is so eager to re-build & re-link

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 24 2021, Johannes Sixt wrote:

> Am 24.06.21 um 17:16 schrieb Jeff King:
>> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>>    you touch a Documentation/git-*.txt we need to re-generate these, and
>>>    since their mtime changes we re-compile and re-link all the way up to
>>>    libgit and our other tools.
>>>
>>>    I think the best solution here is to make the generate-*.sh
>>>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>>>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>>    or a smarter awk script this would be <5ms).
>> 
>> Yeah, I think Eric mentioned he had looked into doing this in perl, but
>> we weren't entirely happy with the dependency. Here's another really odd
>> thing I noticed:
>> 
>>   $ time sh ./generate-cmdlist.sh command-list.txt >one
>>   real	0m1.323s
>>   user	0m1.531s
>>   sys	0m0.834s
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>>   [a bunch of trace output]
>>   real	0m0.513s
>>   user	0m0.754s
>>   sys	0m0.168s
>> 
>>   $ cmp one two
>>   [no output]
>> 
>> Er, what? Running with "-x" makes it almost 3 times faster to generate
>> the same output? I'd have said this is an anomaly, but it's repeatable
>> (and swapping the order produces the same output, so it's not some weird
>> priming thing). And then to top it all off, redirecting the trace is
>> slow again:
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>>   real	0m1.363s
>>   user	0m1.538s
>>   sys	0m0.902s
>> 
>> A little mini-mystery that I think I may leave unsolved for now.
>
> Strange, really. Reminds me of the case that the `read` built-in must
> read input byte by byte if stdin is not connected to a (searchable) file.
>
> I have two patches that speed up generate-cmdlist.sh a bit:
>
> generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
> (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03)
> generate-cmdlist.sh: spawn fewer processes
> (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304)
>
> that I can submit (give me some time, though) or interested parties
> could pick up.

Interesting, but I think rather than micro-optimizing the O(n) loop it
makes more sense to turn it into a series of O(1) in -j parallel,
i.e. actually use the make dependency graph for this as I suggested in:
https://lore.kernel.org/git/87wnqiyejg.fsf@xxxxxxxxxxxxxxxxxxx/

Something like the hacky throwaway patch that follows. Now when you
touch a file in Documentation/git-*.txt you re-make just that file
chain, which gets assembled into the command-list.h:
	
	$ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1
	GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    CC version.o
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN command-list.h
	    CC help.o
	    AR libgit.a
	    LINK git

Those build/* files are, respectively, the relevant line corresponding
to the *.txt file from command-list.txt:

    $ cat build/Documentation/git-add.txt.cmdlist.in
    git-add                                 mainporcelain           worktree

And then the line we want in command-list.h at the end:

    $ cat build/Documentation/git-add.txt.cmdlist
        { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree },

The build process in then just a matter of keeping those files
up-to-date, and having "make" create the command-list.h is just a matter
of:

    cat build/Documentation/*.cmdlist

Well, with the categories still being an O(n) affair, but the cost of
generating those is trivial.

I think that aside from optimizing for speed it just makes things
clearer, if you modify e.g. git-add.txt you see a clear chain of output
from make about how we generate output from that, and then make the
command-list.h.

Now we'd just show a mysterious command-list.h entry. Whenever you have
"make" create one file from many it becomes hard to see at a glance
what's going on.

I'd be curious to see what how that performs e.g. on Windows, on Linux I
get e.g. (warm cache):
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN build/Documentation/git-apply.txt.cmdlist.in
	    GEN build/Documentation/git-annotate.txt.cmdlist.in
	    GEN build/Documentation/git-am.txt.cmdlist.in
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    GEN build/Documentation/git-archimport.txt.cmdlist.in
	    GEN build/Documentation/git-archive.txt.cmdlist.in
	    GEN build/Documentation/git-apply.txt.cmdlist
	    GEN build/Documentation/git-annotate.txt.cmdlist
	    GEN build/Documentation/git-am.txt.cmdlist
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN build/Documentation/git-archimport.txt.cmdlist
	    GEN build/Documentation/git-archive.txt.cmdlist
	    GEN command-list.h
	
	real    0m0.214s
	user    0m0.196s
	sys     0m0.071s
	
Doing the same on master takes around 600ms for me at best:
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN command-list.h
	
	real    0m0.611s
	user    0m0.756s
	sys     0m0.112s

It's even faster when I have to make all of them (my -j is = 8), or
around 450ms after a "touch Documentation/git-*.txt".

We have ~170 lines we process in command-list.txt, I'd think on Windows
you'd get much better results, instead of optimizing those number of
"sort | uniq" to the same number of "sort -u" the common case is just
running 1-5 of those, as that's all the *.txt files you changed, then
just "cat-ing" the full set.

diff --git a/.gitignore b/.gitignore
index 311841f9bed..9b365395496 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@
 /GIT-USER-AGENT
 /GIT-VERSION-FILE
 /bin-wrappers/
+/build/
 /git
 /git-add
 /git-add--interactive
diff --git a/Makefile b/Makefile
index c3565fc0f8f..5e845bd0f69 100644
--- a/Makefile
+++ b/Makefile
@@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \
 		>$@+ && mv $@+ $@
 
-command-list.h: generate-cmdlist.sh command-list.txt
+build/Documentation:
+	$(QUIET_GEN)mkdir -p build/Documentation
+.PRECIOUS: build/Documentation/git%.txt.cmdlist.in
+build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt
+	$(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \
+	then \
+		# For e.g. git-init-db, which has a *.txt file, but no \
+		# command-list.h entry \
+		>$@; \
+	fi
+build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in
+	$(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@
 
-command-list.h: $(wildcard Documentation/git*.txt)
-	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
+COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt))
+
+COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES))
+command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN)
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
-		command-list.txt >$@+ && mv $@+ $@
+		command-list.txt >$@+ && \
+	echo "static struct cmdname_help command_list[] = {" >>$@+ && \
+	cat build/Documentation/*.txt.cmdlist >>$@+ && \
+	echo "};" >>$@+ && \
+	mv $@+ $@
 
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..f80266d5138 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -1,5 +1,8 @@
 #!/bin/sh
 
+mode=$1
+shift
+
 die () {
 	echo "$@" >&2
 	exit 1
@@ -61,8 +64,6 @@ define_category_names () {
 }
 
 print_command_list () {
-	echo "static struct cmdname_help command_list[] = {"
-
 	command_list "$1" |
 	while read cmd rest
 	do
@@ -73,6 +74,9 @@ print_command_list () {
 		done
 		echo " },"
 	done
+}
+
+end_print_command_list () {
 	echo "};"
 }
 
@@ -84,6 +88,12 @@ do
 	shift
 done
 
+if test "$mode" = "--tail"
+then
+	print_command_list "$1"
+	exit 0
+fi
+
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
 	const char *name;
@@ -94,5 +104,3 @@ struct cmdname_help {
 define_categories "$1"
 echo
 define_category_names "$1"
-echo
-print_command_list "$1"




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux