Re: A potential approach to making tests faster on Windows

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 3 Apr 2018 11:49:38 +0200 (DST)

Hi Peff,

On Fri, 30 Mar 2018, Jeff King wrote:

> On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > I've wondered for a while whether it wouldn't be a viable approach to
> > make something like an interpreter for our test suite to get around
> > this problem, i.e. much of it's very repetitive and just using a few
> > shell functions we've defined, what if we had C equivalents of those?
> 
> I've had a similar thought, though I wonder how far we could get with
> just shell. I even tried it out with test_cmp:
> 
>   https://public-inbox.org/git/20161020215647.5no7effvutwep2xt@xxxxxxxxxxxxxxxxxxxxx/
> 
> But Johannes Sixt pointed out that they already do this (see
> mingw_test_cmp in test-lib-functions).

Right.

Additionally, I noticed that that simple loop in shell is *also* very slow on
Windows (at least in the MSYS2 Bash we use in Git for Windows).

Under the assumption that it is the Bash with the loop that uses too much
POSIX emulation to make it fast, I re-implemented mingw_test_cmp in pure
C:
https://github.com/git-for-windows/git/commit/8a96ef63a0083ba02305dfeef6ff92c31b4fd7c3

Unfortunately, it did not produce any noticeable speed improvement, so I
did not even finish the conversion (when the cmp fails, it does not show
you any helpful diff yet).

> I also tried to explore a few numbers about process invocations to see
> if running shell commands is the problem:
> 
>   https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@xxxxxxxxxxxxxxxxxxxxx/

This mail was still in my inbox, in want of me saying something about
this.

My main evidence that shell scripts on macOS are slower than on Linux was
the difference of the improvement incurred by moving more things from
git-rebase--interactive.sh into sequencer.c: Linux saw an improvement only
of about 3x, while macOS saw an improvement of 4x, IIRC. If I don't
remember the absolute numbers correctly, at least I vividly remember the
qualitative difference: It was noticeable.

> There was some discussion there about whether the problem is programs
> being exec'd, or if it's forks due to subshells. And if it is programs
> being exec'd, whether it's shell programs or if it is simply that we
> exec Git a huge number of times.

One large problem there is that it is really hard to analyze performance
over such a heterogenous code base: part C, part Perl, part Unix shell
(and of course, when you say Unix shell, you imply dozens of separate
tools that *also* need to be performance-profiled). I have very good
profiling tools for C, I saw some built-in performance profiling for Perl,
but there is no good performance profiling for Unix shell scripting: I
doubt that the inventors of shell scripting had speed-critical production
code in mind when they came up with the idea.

I did invest dozens of hours earlier this year trying to obtain debug
symbols in .pdb format (ready for Visual Studio's really envy-inducing
performance profiler) also for the MSYS2 runtime and Bash, so that I could
analyze what makes things so awfully slow in Git's test suite.

The only problem is that I also have to do other things in my day-job, so
that project waits patiently until I have some time to come back to that
project.

Ciao,
Dscho