On Tue, Apr 03 2018, Johannes Schindelin wrote: > Hi Peff, > > On Fri, 30 Mar 2018, Jeff King wrote: > >> On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> > I've wondered for a while whether it wouldn't be a viable approach to >> > make something like an interpreter for our test suite to get around >> > this problem, i.e. much of it's very repetitive and just using a few >> > shell functions we've defined, what if we had C equivalents of those? >> >> I've had a similar thought, though I wonder how far we could get with >> just shell. I even tried it out with test_cmp: >> >> https://public-inbox.org/git/20161020215647.5no7effvutwep2xt@xxxxxxxxxxxxxxxxxxxxx/ >> >> But Johannes Sixt pointed out that they already do this (see >> mingw_test_cmp in test-lib-functions). > > Right. > > Additionally, I noticed that that simple loop in shell is *also* very slow on > Windows (at least in the MSYS2 Bash we use in Git for Windows). > > Under the assumption that it is the Bash with the loop that uses too much > POSIX emulation to make it fast, I re-implemented mingw_test_cmp in pure > C: > https://github.com/git-for-windows/git/commit/8a96ef63a0083ba02305dfeef6ff92c31b4fd7c3 > > Unfortunately, it did not produce any noticeable speed improvement, so I > did not even finish the conversion (when the cmp fails, it does not show > you any helpful diff yet). I don't know the details of Windows, but it sounds like you're trying to performance test two things that are going to suck for different reasons. On one hand the pure-*.sh comparison would be slower than just diff on *nix, because it's not C, so you'll get that slowness, but gain in not having to fork another process. On the other hand the C implementation is going to be really fast, but it's going to take you a long time to get it started on Windows. Which is why I think it would be really interesting to see the third approach I suggested, i.e. hack the shell to make the test_cmp a builtin and test that. Then you won't fork, but will get the advantage of your fast C codepath. Also, even if test_cmp is much faster, Peff's results over at https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@xxxxxxxxxxxxxxxxxxxxx/ suggest that you may not notice anyway. Aside from the points raised there about the bin wrappers it seems the easiest wins are having a builtin version of "rm" and "cat". Are you able to compile dash on Windows with some modification of the patch I sent upthread? If not it doesn't seem too hard to do the same trick for bash, see: git grep '\balias\b' -- builtins Once you have bash.git checked out. I.e. you add a bit of Makefile boilerplate and you should be able to get a new builtin. >> I also tried to explore a few numbers about process invocations to see >> if running shell commands is the problem: >> >> https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@xxxxxxxxxxxxxxxxxxxxx/ > > This mail was still in my inbox, in want of me saying something about > this. > > My main evidence that shell scripts on macOS are slower than on Linux was > the difference of the improvement incurred by moving more things from > git-rebase--interactive.sh into sequencer.c: Linux saw an improvement only > of about 3x, while macOS saw an improvement of 4x, IIRC. If I don't > remember the absolute numbers correctly, at least I vividly remember the > qualitative difference: It was noticeable. > >> There was some discussion there about whether the problem is programs >> being exec'd, or if it's forks due to subshells. And if it is programs >> being exec'd, whether it's shell programs or if it is simply that we >> exec Git a huge number of times. > > One large problem there is that it is really hard to analyze performance > over such a heterogenous code base: part C, part Perl, part Unix shell > (and of course, when you say Unix shell, you imply dozens of separate > tools that *also* need to be performance-profiled). I have very good > profiling tools for C, I saw some built-in performance profiling for Perl, > but there is no good performance profiling for Unix shell scripting: I > doubt that the inventors of shell scripting had speed-critical production > code in mind when they came up with the idea. > > I did invest dozens of hours earlier this year trying to obtain debug > symbols in .pdb format (ready for Visual Studio's really envy-inducing > performance profiler) also for the MSYS2 runtime and Bash, so that I could > analyze what makes things so awfully slow in Git's test suite. > > The only problem is that I also have to do other things in my day-job, so > that project waits patiently until I have some time to come back to that > project. > > Ciao, > Dscho