Am 06.12.2022 um 16:07 schrieb Johannes Schindelin via GitGitGadget: > From: Johannes Schindelin <johannes.schindelin@xxxxxx> > > When Git's test suite uses `test_cmp`, it is not actually trying to > compare binary files as the name `cmp` would suggest to users familiar > with Unix' tools, but the tests instead verify that actual output > matches the expected text. > > On Unix, `cmp` works well enough for Git's purposes because only Line > Feed characters are used as line endings. However, on Windows, while > most tools accept Line Feeds as line endings, many tools produce > Carriage Return + Line Feed line endings, including some of the tools > used by the test suite (which are therefore provided via Git for Windows > SDK). Therefore, `cmp` would frequently fail merely due to different > line endings. > > To accommodate for that, the `mingw_test_cmp` function was introduced > into Git's test suite to perform a line-by-line comparison that ignores > line endings. This function is a Bash function that is only used on > Windows, everywhere else `cmp` is used. > > This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is > slow, even more so on Windows because it is a Bash script function, and > Bash scripts are known to run particularly slowly on Windows due to > Bash's need for the POSIX emulation layer provided by the MSYS2 runtime. > > The commit message of 32ed3314c104 (t5351: avoid using `test_cmp` for > binary data, 2022-07-29) provides an illuminating account of the > consequences: On Windows, the platform on which Git could really use all > the help it can get to improve its performance, the time spent on one > entire test script was reduced from half an hour to less than half a > minute merely by avoiding a single call to `mingw_test_cmp` in but a > single test case. > > Learning the lesson to avoid shell scripting wherever possible, the Git > for Windows project implemented a minimal replacement for > `mingw_test_cmp` in the form of a `test-tool` subcommand that parses the > input files line by line, ignoring line endings, and compares them. > Essentially the same thing as `mingw_test_cmp`, but implemented in > C instead of Bash. This solution served the Git for Windows project > well, over years. > > However, when this solution was finally upstreamed, the conclusion was > reached that a change to use `git diff --no-index` instead of > `mingw_test_cmp` was more easily reviewed and hence should be used > instead. > > The reason why this approach was not even considered in Git for Windows > is that in 2007, there was already a motion on the table to use Git's > own diff machinery to perform comparisons in Git's test suite, but it > was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/ > as undesirable because tests might potentially succeed due to bugs in > the diff machinery when they should not succeed, and those bugs could > therefore hide regressions that the tests try to prevent. > > By the time Git for Windows' `mingw-test-cmp` in C was finally > contributed to the Git mailing list, reviewers agreed that the diff > machinery had matured enough and should be used instead. > > When the concern was raised that the diff machinery, due to its > complexity, would perform substantially worse than the test helper > originally implemented in the Git for Windows project, a test > demonstrated that these performance differences are well lost within the > 100+ minutes it takes to run Git's test suite on Windows. Only t3920 needs mingw_test_cmp on my system. [2] on top of [1] removes that dependency. Does that work for you as well? diff -u and git diff are slower than mingw_test_cmp when running "make test" on my system ("uname -rs" says "MINGW64_NT-10.0-22621 3.3.6-341.x86_64"). Timings (sorry, only a single run each): 2.39.0-rc2: All tests successful. Files=983, Tests=26154, 2082 wallclock secs ( 7.86 usr 15.97 sys + 776.73 cusr 2352.96 csys = 3153.52 CPU) Result: PASS 2.39.0-rc2 + [1] + [2] + removal of "GIT_TEST_CMP=mingw_test_cmp" from t/test-lib.sh: All tests successful. Files=983, Tests=26154, 2174 wallclock secs ( 7.75 usr 17.27 sys + 813.53 cusr 2699.12 csys = 3537.66 CPU) Result: PASS 2.39.0-rc2 + your v5: All tests successful. Files=983, Tests=26154, 2160 wallclock secs ( 4.16 usr 10.03 sys + 647.75 cusr 2038.99 csys = 2700.92 CPU) Result: PASS 2.39.0-rc2 + [1] + [2] + the patch below: All tests successful. Files=983, Tests=26154, 2026 wallclock secs ( 4.91 usr 9.91 sys + 608.91 cusr 1881.98 csys = 2505.70 CPU) Result: PASS This is noisy, running "make test" multiple times takes too long. Running a single test using hyperfine is feasible. Here are my results for the last test that needed a CR-agnostic test_cmp: 2.39.0-rc2: Benchmark 1: sh.exe t3920-crlf-messages.sh Time (mean ± σ): 5.666 s ± 0.085 s [User: 0.000 s, System: 0.007 s] Range (min … max): 5.583 s … 5.858 s 10 runs 2.39.0-rc2 + [1] + [2] + removal of "GIT_TEST_CMP=mingw_test_cmp" from t/test-lib.sh: Benchmark 1: sh.exe t3920-crlf-messages.sh Time (mean ± σ): 6.540 s ± 0.065 s [User: 0.000 s, System: 0.004 s] Range (min … max): 6.454 s … 6.681 s 10 runs 2.39.0-rc2 + your v5: Benchmark 1: sh.exe t3920-crlf-messages.sh Time (mean ± σ): 6.632 s ± 0.090 s [User: 0.000 s, System: 0.004 s] Range (min … max): 6.550 s … 6.791 s 10 runs 2.39.0-rc2 + [1] + [2] + the patch below: Benchmark 1: sh.exe t3920-crlf-messages.sh Time (mean ± σ): 5.743 s ± 0.065 s [User: 0.002 s, System: 0.001 s] Range (min … max): 5.684 s … 5.870 s 10 runs Still noisy, but avoiding to fork out to a comparison program seems worth it, i.e. the shell function wins for the typically short inputs in test scripts. Do you get different numbers? I'm a bit disappointed by the performance of the patch below, but we'd need something like this to compare precisely (no longer ignoring CRs), I suppose. [1] https://lore.kernel.org/git/febcfb0a-c410-fb71-cff9-92acfcb269e2@xxxxxxxx/ [2] https://lore.kernel.org/git/cbe88abc-c1fb-cb50-6057-47ff27f7a12d@xxxxxx/ diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 796093a7b3..bf10746a08 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1450,70 +1450,21 @@ test_skip_or_die () { error "$2" } -# The following mingw_* functions obey POSIX shell syntax, but are actually -# bash scripts, and are meant to be used only with bash on Windows. - -# A test_cmp function that treats LF and CRLF equal and avoids to fork -# diff when possible. +# A test_cmp function that avoids to fork diff when possible. It's only +# meant to be used with bash on Windows. mingw_test_cmp () { # Read text into shell variables and compare them. If the results # are different, use regular diff to report the difference. - local test_cmp_a= test_cmp_b= - - # When text came from stdin (one argument is '-') we must feed it - # to diff. - local stdin_for_diff= + local test_cmp_a=a test_cmp_b=b - # Since it is difficult to detect the difference between an - # empty input file and a failure to read the files, we go straight - # to diff if one of the inputs is empty. - if test -s "$1" && test -s "$2" - then - # regular case: both files non-empty - mingw_read_file_strip_cr_ test_cmp_a <"$1" - mingw_read_file_strip_cr_ test_cmp_b <"$2" - elif test -s "$1" && test "$2" = - - then - # read 2nd file from stdin - mingw_read_file_strip_cr_ test_cmp_a <"$1" - mingw_read_file_strip_cr_ test_cmp_b - stdin_for_diff='<<<"$test_cmp_b"' - elif test "$1" = - && test -s "$2" + # Leave the uncommon case of reading from stdin to diff. + if test "$1" != "-" && test "$2" != "-" then - # read 1st file from stdin - mingw_read_file_strip_cr_ test_cmp_a - mingw_read_file_strip_cr_ test_cmp_b <"$2" - stdin_for_diff='<<<"$test_cmp_a"' + IFS= read -r -d '' test_cmp_a <"$1" + IFS= read -r -d '' test_cmp_b <"$2" fi - test -n "$test_cmp_a" && - test -n "$test_cmp_b" && test "$test_cmp_a" = "$test_cmp_b" || - eval "diff -u \"\$@\" $stdin_for_diff" -} - -# $1 is the name of the shell variable to fill in -mingw_read_file_strip_cr_ () { - # Read line-wise using LF as the line separator - # and use IFS to strip CR. - local line - while : - do - if IFS=$'\r' read -r -d $'\n' line - then - # good - line=$line$'\n' - else - # we get here at EOF, but also if the last line - # was not terminated by LF; in the latter case, - # some text was read - if test -z "$line" - then - # EOF, really - break - fi - fi - eval "$1=\$$1\$line" - done + diff -u "$@" } # Like "env FOO=BAR some-program", but run inside a subshell, which means