On Wed, Oct 20 2021, Taylor Blau wrote: > On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote: >> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> > This series is based off an off-hand comment I made about making the >> > cmdlist.sh faster, in the meantime much of the same methods are >> > already cooking in "next" for the "lint-docs" target. >> > >> > See 7/8 for the main performance numbers, along the way I stole some >> > patches from Johannes Sixt who'd worked on optimizing the script >> > before, which compliment this new method of generating this file by >> > piggy-backing more on GNU make for managing a dependency graph for us. >> >> I still think this is a much more complicated and error-prone approach >> than just making the script faster. I know we can't rely on perl, but >> could we use it optimistically? Jeff: Just in terms of error prone both of these implementations will accept bad input that's being caught in 8/8 of this series. We accept a lot of bad input now, ending up with some combinations of bad output or compile errors if you screw with the input *.txt files. I think I've addressed all of those in this series. If you mean the general concept of making a "foo.gen" from a "foo.txt" as an intermediate with make as a way to get to "many-foo.h" I don't really see how it's error prone conceptually. You get error checking each step of the way, and it encourages logic that's simpler each step of the way. > I'll take credit for this terrible idea of using Perl when available. > > But I don't think we even need to, since we could just rely on Awk. That > has all the benefits you described while still avoiding the circular > dependency on libgit.a. But the killer feature is that we don't have to > rely on two implementations, the lesser-used of which is likely to > bitrot over time. > > The resulting awk is a little ugly, because of the nested-ness. I'm also > no awk-spert, but I think that something like the below gets the job > done. > > It also has the benefit of being slightly faster than the equivalent > Perl implementation, for whatever those extra ~9 ms are worth ;). > > Benchmark #1: sh generate-cmdlist.sh command-list.txt > Time (mean ± σ): 25.3 ms ± 5.3 ms [User: 31.1 ms, System: 8.3 ms] > Range (min … max): 15.5 ms … 31.7 ms 95 runs > > Benchmark #2: sh generate-cmdlist.sh.old command-list.txt > Time (mean ± σ): 34.9 ms ± 9.8 ms [User: 41.0 ms, System: 6.9 ms] > Range (min … max): 22.4 ms … 54.8 ms 64 runs > > Summary > 'sh generate-cmdlist.sh command-list.txt' ran > 1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt' > > --- > > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > index a1ab2b1f07..39338ef1cc 100755 > --- a/generate-cmdlist.sh > +++ b/generate-cmdlist.sh > @@ -64,12 +64,19 @@ print_command_list () { > echo "static struct cmdname_help command_list[] = {" > > command_list "$1" | > - while read cmd rest > - do > - printf " { \"$cmd\", $(get_synopsis $cmd), 0" > - printf " | CAT_%s" $(echo "$rest" | get_category_line) > - echo " }," > - done > + awk '{ > + f="Documentation/" $1 ".txt" > + while((getline line<f) > 0) { > + if (match(line, "^" $1 " - ")) { > + syn=substr(line, RLENGTH+1) > + printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn > + for (i=2; i<=NF; i++) { > + printf " | CAT_%s", $i > + } > + print " }," > + } > + } > + }' > echo "};" > } Per Eric's Sunshine's upthread comments an awk and Perl implementation were both considered before[1]. I also care a bit about the timings of the from-scratch build, but I think they're way less interesting than a partial build. I.e. I think if you e.g. touch Documentation/git-a*.txt with this series with/without this awk version the difference in runtime is within the error bars. I.e. making the loop faster isn't necessary. It's better to get to a point where make can save you from doing all/most of the work by checking modification times, rather than making an O(n) loop faster. The only reason there's even a loop there is because it's used by the cmake logic in contrib/* (how we've ended up with a hard dependency in contrib is another matter...). I'm also interested in (and have WIP patches for) simplifying things more generally in the Makefile. Once we have a file exploded out has just the synopsis line that can be used to replace what's now in Documentation/cmd-list.perl, i.e. those summary blurbs also end up in "man git". There's subtle dependency issues there as well, and just having a one-off solution for the the command-list.h doesn't get us closer to addressing that sibling implementation. In terms of future Makefile work I was hoping to get this in, untangle some of the complexity between the inter-dependency of Makefile & Documentation/Makefile (eventually just merging the two, and leaving a stub in Documentation/Makefile). I've also got a working implementation for getting rid of all of the "FORCE" dependencies (except the version one). 1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@xxxxxxxxxxxxxx/