On Thu, Jun 24 2021, Johannes Sixt wrote: > Am 24.06.21 um 17:16 schrieb Jeff King: >> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: >>> * {command,config}-list.h (and in-flight, my hook-list.h): Every time >>> you touch a Documentation/git-*.txt we need to re-generate these, and >>> since their mtime changes we re-compile and re-link all the way up to >>> libgit and our other tools. >>> >>> I think the best solution here is to make the generate-*.sh >>> shellscripts faster (just one takes ~300ms of nested shellscripting, >>> just to grep out the first few lines of every git-*.txt, in e.g. Perl >>> or a smarter awk script this would be <5ms). >> >> Yeah, I think Eric mentioned he had looked into doing this in perl, but >> we weren't entirely happy with the dependency. Here's another really odd >> thing I noticed: >> >> $ time sh ./generate-cmdlist.sh command-list.txt >one >> real 0m1.323s >> user 0m1.531s >> sys 0m0.834s >> >> $ time sh -x ./generate-cmdlist.sh command-list.txt >two >> [a bunch of trace output] >> real 0m0.513s >> user 0m0.754s >> sys 0m0.168s >> >> $ cmp one two >> [no output] >> >> Er, what? Running with "-x" makes it almost 3 times faster to generate >> the same output? I'd have said this is an anomaly, but it's repeatable >> (and swapping the order produces the same output, so it's not some weird >> priming thing). And then to top it all off, redirecting the trace is >> slow again: >> >> $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null >> real 0m1.363s >> user 0m1.538s >> sys 0m0.902s >> >> A little mini-mystery that I think I may leave unsolved for now. > > Strange, really. Reminds me of the case that the `read` built-in must > read input byte by byte if stdin is not connected to a (searchable) file. > > I have two patches that speed up generate-cmdlist.sh a bit: > > generate-cmdlist.sh: replace for loop by printf's auto-repeat feature > (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03) > generate-cmdlist.sh: spawn fewer processes > (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304) > > that I can submit (give me some time, though) or interested parties > could pick up. Interesting, but I think rather than micro-optimizing the O(n) loop it makes more sense to turn it into a series of O(1) in -j parallel, i.e. actually use the make dependency graph for this as I suggested in: https://lore.kernel.org/git/87wnqiyejg.fsf@xxxxxxxxxxxxxxxxxxx/ Something like the hacky throwaway patch that follows. Now when you touch a file in Documentation/git-*.txt you re-make just that file chain, which gets assembled into the command-list.h: $ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1 GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty GEN build/Documentation/git-add.txt.cmdlist.in CC version.o GEN build/Documentation/git-add.txt.cmdlist GEN command-list.h CC help.o AR libgit.a LINK git Those build/* files are, respectively, the relevant line corresponding to the *.txt file from command-list.txt: $ cat build/Documentation/git-add.txt.cmdlist.in git-add mainporcelain worktree And then the line we want in command-list.h at the end: $ cat build/Documentation/git-add.txt.cmdlist { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree }, The build process in then just a matter of keeping those files up-to-date, and having "make" create the command-list.h is just a matter of: cat build/Documentation/*.cmdlist Well, with the categories still being an O(n) affair, but the cost of generating those is trivial. I think that aside from optimizing for speed it just makes things clearer, if you modify e.g. git-add.txt you see a clear chain of output from make about how we generate output from that, and then make the command-list.h. Now we'd just show a mysterious command-list.h entry. Whenever you have "make" create one file from many it becomes hard to see at a glance what's going on. I'd be curious to see what how that performs e.g. on Windows, on Linux I get e.g. (warm cache): $ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1 GEN build/Documentation/git-apply.txt.cmdlist.in GEN build/Documentation/git-annotate.txt.cmdlist.in GEN build/Documentation/git-am.txt.cmdlist.in GEN build/Documentation/git-add.txt.cmdlist.in GEN build/Documentation/git-archimport.txt.cmdlist.in GEN build/Documentation/git-archive.txt.cmdlist.in GEN build/Documentation/git-apply.txt.cmdlist GEN build/Documentation/git-annotate.txt.cmdlist GEN build/Documentation/git-am.txt.cmdlist GEN build/Documentation/git-add.txt.cmdlist GEN build/Documentation/git-archimport.txt.cmdlist GEN build/Documentation/git-archive.txt.cmdlist GEN command-list.h real 0m0.214s user 0m0.196s sys 0m0.071s Doing the same on master takes around 600ms for me at best: $ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1 GEN command-list.h real 0m0.611s user 0m0.756s sys 0m0.112s It's even faster when I have to make all of them (my -j is = 8), or around 450ms after a "touch Documentation/git-*.txt". We have ~170 lines we process in command-list.txt, I'd think on Windows you'd get much better results, instead of optimizing those number of "sort | uniq" to the same number of "sort -u" the common case is just running 1-5 of those, as that's all the *.txt files you changed, then just "cat-ing" the full set. diff --git a/.gitignore b/.gitignore index 311841f9bed..9b365395496 100644 --- a/.gitignore +++ b/.gitignore @@ -13,6 +13,7 @@ /GIT-USER-AGENT /GIT-VERSION-FILE /bin-wrappers/ +/build/ /git /git-add /git-add--interactive diff --git a/Makefile b/Makefile index c3565fc0f8f..5e845bd0f69 100644 --- a/Makefile +++ b/Makefile @@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt $(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \ >$@+ && mv $@+ $@ -command-list.h: generate-cmdlist.sh command-list.txt +build/Documentation: + $(QUIET_GEN)mkdir -p build/Documentation +.PRECIOUS: build/Documentation/git%.txt.cmdlist.in +build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt + $(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \ + then \ + # For e.g. git-init-db, which has a *.txt file, but no \ + # command-list.h entry \ + >$@; \ + fi +build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in + $(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@ -command-list.h: $(wildcard Documentation/git*.txt) - $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \ +COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt)) + +COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES)) +command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN) + $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \ $(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \ - command-list.txt >$@+ && mv $@+ $@ + command-list.txt >$@+ && \ + echo "static struct cmdname_help command_list[] = {" >>$@+ && \ + cat build/Documentation/*.txt.cmdlist >>$@+ && \ + echo "};" >>$@+ && \ + mv $@+ $@ SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\ $(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\ diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9dbbb08e70a..f80266d5138 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -1,5 +1,8 @@ #!/bin/sh +mode=$1 +shift + die () { echo "$@" >&2 exit 1 @@ -61,8 +64,6 @@ define_category_names () { } print_command_list () { - echo "static struct cmdname_help command_list[] = {" - command_list "$1" | while read cmd rest do @@ -73,6 +74,9 @@ print_command_list () { done echo " }," done +} + +end_print_command_list () { echo "};" } @@ -84,6 +88,12 @@ do shift done +if test "$mode" = "--tail" +then + print_command_list "$1" + exit 0 +fi + echo "/* Automatically generated by generate-cmdlist.sh */ struct cmdname_help { const char *name; @@ -94,5 +104,3 @@ struct cmdname_help { define_categories "$1" echo define_category_names "$1" -echo -print_command_list "$1"