Half a year ago, directly after sending a patch to fix a performance regression due to a mis-use of test_cmp [https://lore.kernel.org/git/b9203ea247776332e4b6f519aa27d541207adc2f.1659097724.git.gitgitgadget@xxxxxxxxx/], I got curious to see whether Git for Windows had the same issue. And it did not: it passed t5351 in 22 seconds, even while using test_cmp to compare pack files [https://github.com/git-for-windows/git/blob/3922f62f0d5991e9fe0a0817ebf89a91339c7705/t/t5351-unpack-large-objects.sh#L90]. This motivated me to upstream Git for Windows' changes. Changes since v4: * The commit message was rewritten almost completely. Changes since v3: * Fixed the subject of the cover letter (which should have been adjusted in v3) * Elaborated the paragraph about the historical context of this patch Changes since v2: * Dropped the test helper, using diff --no-index instead. Changes since v1: * Fixed double "with" in the commit message. * Renamed the test helper to text-cmp. * Made the diff --no-index call more robust by using a double-dash separator. Johannes Schindelin (2): t0021: use Windows-friendly `pwd` tests(mingw): avoid very slow `mingw_test_cmp` t/t0021-conversion.sh | 4 +-- t/test-lib-functions.sh | 66 ----------------------------------------- t/test-lib.sh | 2 +- 3 files changed, 3 insertions(+), 69 deletions(-) base-commit: 23b219f8e3f2adfb0441e135f0a880e6124f766c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1309%2Fdscho%2Fmingw-test-cmp-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1309/dscho/mingw-test-cmp-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1309 Range-diff vs v4: 1: b38b8fb5a85 = 1: b38b8fb5a85 t0021: use Windows-friendly `pwd` 2: 128b1f348d8 ! 2: 6a80fab7e39 tests(mingw): avoid very slow `mingw_test_cmp` @@ Metadata ## Commit message ## tests(mingw): avoid very slow `mingw_test_cmp` - It is more performant to run `git diff --no-index` than running the - `mingw_test_cmp` code with MSYS2's Bash, i.e. the Bash that Git for - Windows uses. And a lot more readable. + When Git's test suite uses `test_cmp`, it is not actually trying to + compare binary files as the name `cmp` would suggest to users familiar + with Unix' tools, but the tests instead verify that actual output + matches the expected text. - The original reason why Git's test suite needs the `mingw_test_cmp` - function at all (and why `cmp` is not good enough) is that Git's test - suite is not actually trying to compare binary files when it calls - `test_cmp`, but it compares text files. And those text files can contain - CR/LF line endings depending on the circumstances. + On Unix, `cmp` works well enough for Git's purposes because only Line + Feed characters are used as line endings. However, on Windows, while + most tools accept Line Feeds as line endings, many tools produce + Carriage Return + Line Feed line endings, including some of the tools + used by the test suite (which are therefore provided via Git for Windows + SDK). Therefore, `cmp` would frequently fail merely due to different + line endings. - Note: The original fix in the Git for Windows project implemented a test - helper that avoids the overhead of the diff machinery, in favor of - implementing a behavior that is more in line with what `mingw_test_cmp` - does now. This was done to minimize the risk in using something as - complex as the diff machinery to perform something as simple as - determining whether text output is identical to the expected output or - not. This approach has served Git for Windows well for years, but the - attempt to upstream this saw a lot of backlash and distractions during - the review, was disliked by the Git maintainer and was therefore - abandoned. For full details, see the thread at - https://lore.kernel.org/git/pull.1309.git.1659106382128.gitgitgadget@xxxxxxxxx/t + To accommodate for that, the `mingw_test_cmp` function was introduced + into Git's test suite to perform a line-by-line comparison that ignores + line endings. This function is a Bash function that is only used on + Windows, everywhere else `cmp` is used. + + This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is + slow, even more so on Windows because it is a Bash script function, and + Bash scripts are known to run particularly slowly on Windows due to + Bash's need for the POSIX emulation layer provided by the MSYS2 runtime. + + The commit message of 32ed3314c104 (t5351: avoid using `test_cmp` for + binary data, 2022-07-29) provides an illuminating account of the + consequences: On Windows, the platform on which Git could really use all + the help it can get to improve its performance, the time spent on one + entire test script was reduced from half an hour to less than half a + minute merely by avoiding a single call to `mingw_test_cmp` in but a + single test case. + + Learning the lesson to avoid shell scripting wherever possible, the Git + for Windows project implemented a minimal replacement for + `mingw_test_cmp` in the form of a `test-tool` subcommand that parses the + input files line by line, ignoring line endings, and compares them. + Essentially the same thing as `mingw_test_cmp`, but implemented in + C instead of Bash. This solution served the Git for Windows project + well, over years. + + However, when this solution was finally upstreamed, the conclusion was + reached that a change to use `git diff --no-index` instead of + `mingw_test_cmp` was more easily reviewed and hence should be used + instead. + + The reason why this approach was not even considered in Git for Windows + is that in 2007, there was already a motion on the table to use Git's + own diff machinery to perform comparisons in Git's test suite, but it + was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/ + as undesirable because tests might potentially succeed due to bugs in + the diff machinery when they should not succeed, and those bugs could + therefore hide regressions that the tests try to prevent. + + By the time Git for Windows' `mingw-test-cmp` in C was finally + contributed to the Git mailing list, reviewers agreed that the diff + machinery had matured enough and should be used instead. + + When the concern was raised that the diff machinery, due to its + complexity, would perform substantially worse than the test helper + originally implemented in the Git for Windows project, a test + demonstrated that these performance differences are well lost within the + 100+ minutes it takes to run Git's test suite on Windows. Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> -- gitgitgadget