Re: A potential approach to making tests faster on Windows

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 03 Apr 2018 13:28:18 +0200

On Tue, Apr 03 2018, Johannes Schindelin wrote:

> Hi Peff,
>
> On Fri, 30 Mar 2018, Jeff King wrote:
>
>> On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > I've wondered for a while whether it wouldn't be a viable approach to
>> > make something like an interpreter for our test suite to get around
>> > this problem, i.e. much of it's very repetitive and just using a few
>> > shell functions we've defined, what if we had C equivalents of those?
>>
>> I've had a similar thought, though I wonder how far we could get with
>> just shell. I even tried it out with test_cmp:
>>
>>   https://public-inbox.org/git/20161020215647.5no7effvutwep2xt@xxxxxxxxxxxxxxxxxxxxx/
>>
>> But Johannes Sixt pointed out that they already do this (see
>> mingw_test_cmp in test-lib-functions).
>
> Right.
>
> Additionally, I noticed that that simple loop in shell is *also* very slow on
> Windows (at least in the MSYS2 Bash we use in Git for Windows).
>
> Under the assumption that it is the Bash with the loop that uses too much
> POSIX emulation to make it fast, I re-implemented mingw_test_cmp in pure
> C:
> https://github.com/git-for-windows/git/commit/8a96ef63a0083ba02305dfeef6ff92c31b4fd7c3
>
> Unfortunately, it did not produce any noticeable speed improvement, so I
> did not even finish the conversion (when the cmp fails, it does not show
> you any helpful diff yet).

I don't know the details of Windows, but it sounds like you're trying to
performance test two things that are going to suck for different
reasons.

On one hand the pure-*.sh comparison would be slower than just diff on
*nix, because it's not C, so you'll get that slowness, but gain in not
having to fork another process.

On the other hand the C implementation is going to be really fast, but
it's going to take you a long time to get it started on Windows.

Which is why I think it would be really interesting to see the third
approach I suggested, i.e. hack the shell to make the test_cmp a builtin
and test that. Then you won't fork, but will get the advantage of your
fast C codepath.

Also, even if test_cmp is much faster, Peff's results over at
https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@xxxxxxxxxxxxxxxxxxxxx/
suggest that you may not notice anyway. Aside from the points raised
there about the bin wrappers it seems the easiest wins are having a
builtin version of "rm" and "cat".

Are you able to compile dash on Windows with some modification of the
patch I sent upthread? If not it doesn't seem too hard to do the same
trick for bash, see:

    git grep '\balias\b' -- builtins

Once you have bash.git checked out. I.e. you add a bit of Makefile
boilerplate and you should be able to get a new builtin.

>> I also tried to explore a few numbers about process invocations to see
>> if running shell commands is the problem:
>>
>>   https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@xxxxxxxxxxxxxxxxxxxxx/
>
> This mail was still in my inbox, in want of me saying something about
> this.
>
> My main evidence that shell scripts on macOS are slower than on Linux was
> the difference of the improvement incurred by moving more things from
> git-rebase--interactive.sh into sequencer.c: Linux saw an improvement only
> of about 3x, while macOS saw an improvement of 4x, IIRC. If I don't
> remember the absolute numbers correctly, at least I vividly remember the
> qualitative difference: It was noticeable.
>
>> There was some discussion there about whether the problem is programs
>> being exec'd, or if it's forks due to subshells. And if it is programs
>> being exec'd, whether it's shell programs or if it is simply that we
>> exec Git a huge number of times.
>
> One large problem there is that it is really hard to analyze performance
> over such a heterogenous code base: part C, part Perl, part Unix shell
> (and of course, when you say Unix shell, you imply dozens of separate
> tools that *also* need to be performance-profiled). I have very good
> profiling tools for C, I saw some built-in performance profiling for Perl,
> but there is no good performance profiling for Unix shell scripting: I
> doubt that the inventors of shell scripting had speed-critical production
> code in mind when they came up with the idea.
>
> I did invest dozens of hours earlier this year trying to obtain debug
> symbols in .pdb format (ready for Visual Studio's really envy-inducing
> performance profiler) also for the MSYS2 runtime and Bash, so that I could
> analyze what makes things so awfully slow in Git's test suite.
>
> The only problem is that I also have to do other things in my day-job, so
> that project waits patiently until I have some time to come back to that
> project.
>
> Ciao,
> Dscho