Re: [PATCH v4 00/10] The final building block for a faster rebase -i

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 30 May 2017 22:22:31 +0200

On Tue, May 30, 2017 at 5:44 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
> Hi Ævar,
>
> On Mon, 29 May 2017, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, May 29, 2017 at 12:51 PM, Johannes Schindelin
>> <Johannes.Schindelin@xxxxxx> wrote:
>> >
>> > On Sat, 27 May 2017, René Scharfe wrote:
>> >> Am 26.05.2017 um 05:15 schrieb Liam Beguin:
>> >> > I tried to time the execution on an interactive rebase (on Linux)
>> >> > but I did not notice a significant change in speed.  Do we have a
>> >> > way to measure performance / speed changes between version?
>> >>
>> >> Well, there's performance test script p3404-rebase-interactive.sh.
>> >> You could run it e.g. like this:
>> >>
>> >>       $ (cd t/perf && ./run origin/master HEAD ./p3404*.sh)
>> >>
>> >> This would compare the performance of master with the current branch
>> >> you're on.  The results of p3404 are quite noisy for me on master,
>> >> though (saw 15% difference between runs without any code changes), so
>> >> take them with a bag of salt.
>> >
>> > Indeed. Our performance tests are simply not very meaningful.
>> >
>> > Part of it is the use of shell scripting (which defeats performance
>> > testing pretty well),
>>
>> Don't the performance tests take long enough that the shellscripting
>> overhead gets lost in the noise?
>
> Okay, here you go, my weekly (or so) clarification about the POSIX
> emulation layer called MSYS2 (which itself kind of a portable Cygwin).
>
> Whenever Git for Windows has to execute Unix shell scripts (which are not
> native to Windows, as the "Unix" in "Unix shell scripts" so clearly
> suggests), we resort to calling the Bash from the MSYS2 project, which
> spins up a POSIX emulation layer. Git for Windows' own .exe files
> (and in particular, git.exe) is *not* affected by the POSIX emulation
> layer, as they are real Win32 programs.
>
> Whenever execution has to bridge into, or out of, the POSIX emulation
> layer, a few things need to be done. To emulate signal handling, for
> example, a completely new process has to be spun up that itself has the
> non-MSYS2 process as a child. The environment has to be converted, to
> reflect the fact that some things are Unix-y paths (or path lists) inside
> the POSIX emulation layer and Windows paths outside.
>
> Even when staying within the POSIX emulation layer, some operations are
> not as cheap as "Linux folks" are used to. For example, to spawn a
> subshell, due to the absence of a spawn syscall fork() is called, followed
> by exec(). However, fork() itself is not native to Windows and has to be
> emulated. The POSIX emulation layer spins up a new process, meticulously
> copies the entire memory, tries to reopen the file descriptors, network
> connections, etc (to emulate the fork() semantics).
>
> Obviously, this is anything but cheap.
>
> And this is only a glimpse into the entire problem, as I am not aware of
> any thorough analysis what is going on in msys-2.0.dll when shell scripts
> run. All I know is that it slows things down dramatically.
>
> As a consequence, even the simple act of creating a repository, or
> spawning Win32 processes from within a shell, become quite the
> contributing factors to the noise of the measurements.
>
>> E.g. on Windows what do you get when you run this in t/perf:
>>
>>     $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS="-j6 NO_OPENSSL=Y
>> BLK_SHA1=Y CFLAGS=-O3" ./run v2.10.0 v2.12.0 v2.13.0 p3400-rebase.sh
>
> In my hands, a repeat count of 3 always resulted in quite a bit of noise
> previously.
>
> Mind you, I am working my machine. It has to run two VMs in the
> background, has multiple browsers and dozens of tabs open, checks for
> mails and Tweets and RSS feeds and a couple of Skypes are open, too.
>
> So yeah, obviously there is a bit of noise involved.
>
>> I get split-index performance improving by 28% in 2.12 and 58% in
>> 2.13, small error bars even with just 3 runs. This is on Linux, but my
>> sense of fork overhead on Windows is that it isn't so bad as to matter
>> here.
>
> Test
> ------------------------------------------------------
> 3400.2: rebase on top of a lot of unrelated changes
>
> v2.10.0            v2.12.0                  v2.13.0
> ------------------------------------------------------------------
> 60.65(0.01+0.03)   55.75(0.01+0.07) -8.1%   55.97(0.04+0.09) -7.7%
>
> (wrapped myself, as the ./run output is a lot wider than the 80 columns
> allowed in plain text email format)
>
> And what does it tell you?

For all the above: Yes I get that various things are slower on
Windows, but I think that regardless of that by far the majority of
the time in most of the perf tests is spent on the code being tested,
so it doesn't come into play:

$ parallel -k -j4 '(printf "%s\t" {} && (time GIT_PERF_REPO=~/g/linux
GIT_PERF_LARGE_REPO=~/g/linux ./run {}) 2>&1 | grep ^real | grep -o
[0-9].*) | awk "{print \$2 \" \" \$1}"' ::: p[0-9]*sh
0m34.333s p0000-perf-lib-sanity.sh
4m26.415s p0001-rev-list.sh
1m41.647s p0002-read-cache.sh
28m45.001s p0003-delta-base-cache.sh
0m55.767s p0004-lazy-init-name-hash.sh
0m15.891s p0005-status.sh
3m53.143s p0006-read-tree-checkout.sh
3m28.627s p0071-sort.sh
0m49.541s p0100-globbing.sh
14m42.273s p3400-rebase.sh
0m0.573s p3404-rebase-interactive.sh
0m54.103s p4000-diff-algorithms.sh
0m8.102s p4001-diff-no-index.sh
17m34.819s p4211-line-log.sh
4m50.497s p4220-log-grep-engines.sh
3m59.636s p4221-log-grep-engines-fixed.sh

I.e. I'd expct this to come into play with e.g.
p4000-diff-algorithms.sh, but most of the time spent in e.g.
p4220-log-grep-engines.sh on any non-trivially sized repo is hanging
on git-grep to finish.

Also if you look at test_run_perf_ in perf-lib.sh we only
/usr/bin/time some tiny shell code, actually I can't see now why we're
not just doing *only* /usr/bin/time <code to perf test>, right now we
measure some setup code & sourcing the shell library too.

> Not much, right? You have no idea about the trend line of the three tests,
> not even of the standard deviation (not that it would be meaningful for
> N=3). It is not immediately obvious whether the first run is always a tad
> slower (or faster), or whether there is no noticable difference between
> the first and any subsequent runs.

Right, this could really be improved, but it's  purely in the
presentation layer, we log the raw runtimes:

    $ GIT_PERF_REPEAT_COUNT=6 ./p4220-log-grep-engines.sh
    $ cat trash\ directory.p4220-log-grep-engines/test_time.*
    0:00.01 0.01 0.00
    0:00.01 0.01 0.00
    0:00.02 0.01 0.00
    0:00.02 0.02 0.00
    0:00.01 0.01 0.00
    0:00.02 0.01 0.00

> In other words, there is no measure of confidence in those results. We
> can't say how reliable those numbers are.
>
> And we certainly can't know how much the shell scripting hurts.
>
> Although... let's try something. Let's run the same command in a *Linux
> VM* on the same machine! Yes, that should give us an idea. So here goes:
>
> Test
> ------------------------------------------------------
> 3400.2: rebase on top of a lot of unrelated changes
>
> v2.10.0           v2.12.0                 v2.13.0
> ---------------------------------------------------------------
> 2.08(1.76+0.15)   2.10(1.76+0.15) +1.0%   2.00(1.65+0.15) -3.8%
>
>
> A ha! Not only does this show a curious *increase* in v2.12.0 (but I'd not
> put much stock into that, again N=3 is way too low a repetition number),
> it also shows that the Linux VM runs the same thing roughly 30x faster.
>
> I did see a few speed differences between native git.exe on Windows and
> the git executable on Linux, but it was barely in the two-digit
> *percentage* region. Nowhere near the four-digit percentage region.
>
> So now you know how much shell scripting hurts performance testing.
>
> A lot.

Maybe that rebase test is just very fuzzy. Could you try to run this please:

$ cat p0000-perf-lib-sleep.sh
#!/bin/sh

test_description="Tests sleep performance"

. ./perf-lib.sh

test_perf_default_repo

for s in 1 2 5
do
        test_perf "sleep $s" "sleep $s"
done

test_done

I get:

$ ./run p0000-perf-lib-sleep.sh
=== Running 1 tests in this tree ===
perf 1 - sleep 1: 1 2 3 ok
perf 2 - sleep 2: 1 2 3 ok
perf 3 - sleep 5: 1 2 3 ok
# passed all 3 test(s)
1..3
Test              this tree
---------------------------------
0000.1: sleep 1   1.00(0.00+0.00)
0000.2: sleep 2   2.00(0.00+0.00)
0000.3: sleep 5   5.00(0.00+0.00)

I'd be very curious to see if you get any different results on any of
your systems, particularly Windows.

Unless I'm missing something, if those numbers are 1, 2, 5 or close
enough and with small enough error bars, then the perf framework's use
of shellscripting should be fine, if not we've got something...

> It pretty much renders the entire endeavor of testing performance
> completely and utterly useless.
>
>> I'd also be interested to see what sort of results you get for my
>> "grep: add support for the PCRE v1 JIT API" patch which is in pu now,
>> assuming you have a PCRE newer than 8.32 or so.
>
> pu does not build for me:
>
> 2017-05-30T11:38:50.0089681Z libgit.a(grep.o): In function `pcre1match':
> 2017-05-30T11:38:50.0289250Z .../grep.c:411: undefined reference to `__imp_pcre_jit_exec'
> 2017-05-30T11:38:50.0329160Z collect2.exe: error: ld returned 1 exit
> status

Ouch, looks like I've missed some spot in my pcre1 jit series. What's
the PCRE version you have? This is somehow doing the wrong thing with
this bit in grep.h:

    #include <pcre.h>
    #ifdef PCRE_CONFIG_JIT
    #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
    #define GIT_PCRE1_USE_JIT
    [...]

>> > Frankly, I have no illusion about this getting fixed, ever.
>>
>> I have a project on my TODO that I've been meaning to get to which
>> would address this. I'd be interested to know what people think about
>> the design:
>>
>> * Run the perf tests in some more where the raw runtimes are saved away
>
> You mean a test helper designed to do the timing and the setting up so as
> to time *just* the operations that should be timed?
>
> If so: I am all for it.

No I just mean save away the raw trash*/test_time.* files, so you can
do any sort of analysis on them later, e.g. render stdev, percentiles
etc. But this is assuming the raw numbers are useful, let's find that
out.

>> * Have some way to dump a static html page from that with graphs over
>> time (with gnuplot svg?)
>
> If you already go HTML, it would make much more sense to go d3.js. I would
> even prefer to go c3.js (which uses d3.js) right away. Would make
> everything so much easier.
>
> Not to mention more portable.
>
>> * Supply some config file to drive this, so you can e.g. run each
>> tests N times against your repo X for the last 10 versions of git.
>
> Sure.
>
>> * Since it's static HTML it would be trivial for anyone to share such
>> results, and e.g. setup running them in cron to regularly publish to
>> github pages.
>
> It does not need to be static. It can use, say, c3.js, for the added
> benefit of being able to toggle multiple graphs in the same diagram.

Right, I mean "static" in the sense that it wouldn't require some
dynamic backend, it requiring JS is fine, i.e. I'd just like someone
to be able to publish some raw data, point some JS/html at it, and
have it be rendered.