On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote: > From: Eric Sunshine <sunshine@xxxxxxxxxxxxxx> > > Although chainlint.pl has undergone a good deal of optimization during > its development -- increasing in speed significantly -- parsing and > validating 1050+ scripts and 16500+ tests via Perl is not exactly > instantaneous. However, perceived performance can be improved by taking > advantage of the fact that there is no interdependence between test > scripts or test definitions, thus parsing and validating can be done in > parallel. The number of available cores is determined automatically but > can be overridden via the --jobs option. Per your CL: Ævar offered some sensible comments[2,3] about optimizing the Makefile rules related to chainlint, but those optimizations are not tackled here for a few reasons: (1) this series is already quite long, (2) I'd like to keep the series focused on its primary goal of installing a new and improved linter, (3) these patches do not make the Makefile situation any worse[4], and (4) those optimizations can easily be done atop this series[5]. I have been running with those t/Makefile changesg locally, but didn't submit them. FWIW that's here: https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-dependency-graph-for-check-chainlint Which I'm not entirely sure I'm happy about, and it's jeust about the chainlint tests, but... > +sub ncores { > + # Windows > + return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS}); > + # Linux / MSYS2 / Cygwin / WSL > + do { local @ARGV='/proc/cpuinfo'; return scalar(grep(/^processor\s*:/, <>)); } if -r '/proc/cpuinfo'; > + # macOS & BSD > + return qx/sysctl -n hw.ncpu/ if $^O =~ /(?:^darwin$|bsd)/; > + return 1; > +} > + > sub show_stats { > my ($start_time, $stats) = @_; > my $walltime = $interval->($start_time); > @@ -621,7 +633,9 @@ sub exit_code { > Getopt::Long::Configure(qw{bundling}); > GetOptions( > "emit-all!" => \$emit_all, > + "jobs|j=i" => \$jobs, > "stats|show-stats!" => \$show_stats) or die("option error\n"); > +$jobs = ncores() if $jobs < 1; > > my $start_time = $getnow->(); > my @stats; > @@ -633,6 +647,40 @@ unless (@scripts) { > exit; > } > > -push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); > +unless ($Config{useithreads} && eval { > + require threads; threads->import(); > + require Thread::Queue; Thread::Queue->import(); > + 1; > + }) { > + push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); > + show_stats($start_time, \@stats) if $show_stats; > + exit(exit_code(\@stats)); > +} > + > +my $script_queue = Thread::Queue->new(); > +my $output_queue = Thread::Queue->new(); > + > +sub next_script { return $script_queue->dequeue(); } > +sub emit { $output_queue->enqueue(@_); } > + > +sub monitor { > + while (my $s = $output_queue->dequeue()) { > + print($s); > + } > +} > + > +my $mon = threads->create({'context' => 'void'}, \&monitor); > +threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs; > + > +$script_queue->enqueue(@scripts); > +$script_queue->end(); > + > +for (threads->list()) { > + push(@stats, $_->join()) unless $_ == $mon; > +} > + > +$output_queue->end(); > +$mon->join(); Maybe I'm misunderstanding this whole thing, but this really seems like the wrong direction in an otherwise fantastic direction of a series. I.e. it's *great* that we can do chain-lint without needing to actually execute the *.sh file, this series adds a lint parser that can parse those *.sh "at rest". But in your 16/18 you then do: +if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 +then + "$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" || + BUG "lint error (see '?!...!? annotations above)" +fi I may just be missing something here, but why not instead just borrow what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs" non-.PHONY, 2021-10-15)? I.e. if we can run against t0001-init.sh or whatever *once* to see if it chain-lints OK then surely we could have a rule like: t0001-init.sh.chainlint-ok: t0001-init.sh perl chainlint.pl $< >$@ Then whenever you change t0001-init.sh we refresh that t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll fail to make it, and will unlink that t0001-init.sh.chainlint-ok. That way you wouldn't need any parallelism in the Perl script, because you'd have "make" take care of it, and the common case of re-testing where the speed matters would be that we woudln't need to run this at all, or would only re-run it for the test scripts that changed. (Obviously a "real" implementation would want to create that ".ok" file in t/.build/chainlint" or whatever) A drawback is that you'd probably be slower on the initial run, as you'd spwn N chainlint.pl. You could use $? instead of $< to get around that, but that requires some re-structuring, and I've found it to generally not be worth it. It would also have the drawback that a: ./t0001-init.sh wouldn't run the chain-lint, but this would: make T=t0001-init.sh But if want the former to work we could carry some "GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the test-via-test-lib.sh if it isn't set.