Hi Junio, On Fri, 29 Jul 2022, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" <gitgitgadget@xxxxxxxxx> > writes: > > > + const char *argv[] = { > > + "diff", "--no-index", NULL, NULL, NULL > > + }; > > Don't we want to have "--" before the two paths? Yes! > > + if (!(f0 = !strcmp(argv[1], "-") ? stdin : fopen(argv[1], "r"))) > > + return error_errno("could not open '%s'", argv[1]); > > + if (!(f1 = !strcmp(argv[2], "-") ? stdin : fopen(argv[2], "r"))) { > > + fclose(f0); > > + return error_errno("could not open '%s'", argv[2]); > > + } > > It is tricky that you need to take "-" and treat it as the standard > input stream in either argv[1] or argv[2] (but not both). If would > be a different story in an end-user facing program, but because this > is a test helper, feeding wrong input is developer's fault, and I do > not mind lack of attention to detail of error checking to make sure > we avoid comparing alternating lines of the standard input. No, you're right, I've added a guard that prevents `test-tool cmp - -` from failing in obscure ways. > > + for (;;) { > > + int r0 = strbuf_getline(&b0, f0); > > + int r1 = strbuf_getline(&b1, f1); > > + > > + if (r0 == EOF) { > > + fclose(f0); > > + fclose(f1); > > + strbuf_release(&b0); > > + strbuf_release(&b1); > > + if (r1 == EOF) > > + return 0; > > If both hit the EOF at the same time, we know they are the same, OK. > > > +cmp_failed: > > + if (!run_diff(argv[1], argv[2])) > > If one of argv[] was "-", then this wouldn't work correctly, as the > other file is read from the beginning but the "-" side have consumed > the initial part of the input and we cannot unseek it. This bug > needs to be fixed only if we expect a useful and reliable output > from the helper. Right. I've added a clause that says that we cannot show the diff because `stdin` has been consumed already. > But otherwise the idea is sound. We compare them line by line, > using strbuf_getline() to ignore differences in CRLF and LF that > originates at 4d715ac0 (Windows: a test_cmp that is agnostic to > random LF <> CRLF conversions, 2013-10-26). Only when we find the > input different, we use "git diff --no-index" to make the difference > (and unfortunately more, as it does not ignore CRLF <> LF > differences) visible. > > > + die("Huh? 'diff --no-index %s %s' succeeded", > > + argv[1], argv[2]); > > Nice attention to (possibly irrelevant) detail here. I would have > ignored the return value and reported "they are different" at this > point, though. The line-by-line comparison we did was the > authoritative one, and "git diff --no-index" is merely used for > human readable output. > > In any case, "test-tool mingwcmp" would be a better name that > highlights the spirit of 4d715ac0 to ignore CRLF <> LF issues. IOW, > it does a lot more than "cmp" replacement, and we shouldn't mislead > users/developers into thinking it is a plain "cmp" replacement. Fair point. The Unix tool `cmp` does not care about line endings at all, so when you come from a Unix background you will expect the same to be true for `test-tool cmp`. On the other hand, you will expect the same to be true for `test_cmp`, too, which is not the case, and the root cause of why I had to come up with 32ed3314c10 (t5351: avoid using `test_cmp` for binary data, 2022-07-29). Having said that, I agree that the test tool name should reflect better what the subcommand does. I do dislike the proposed name `mingwcmp`. Not only because it is misleading, as the purpose is not to compare MINGW-specific files but instead the purpose is to compare text files (and, in fact, the tool works just fine on Linux and macOS, too). But also because it would contribute to just how much of a second-class citizen the MINGW-based build is in Git land: From choosing to implement large parts, including the entire test suite as well as the performance benchmarks, in POSIX scripts (which plays to Windows' weaknesses in a big way) to massively favoring spawned processes over multi-threading (which plays to Linux' strengths and to Windows' weaknesses), to a still-inherent assumption that the underlying filesystem is case-sensitive (think: branch names), to an implicit agreement in the core Git community that patch contributions need not take care of working well on Windows (but that that's the job "of Windows folk" instead). This is kind of at odds with the fact that we must assume that half of Git's users are Windows-based (we can only assume, based on surveys, because we successfully avoid any kind of even opt-in telemetry that would give us hard data). I definitely want to stay away from making that second-citizenry even worse. So I am going with the name `test-tool text-cmp` instead. Thank you for your review, Dscho > > Thanks. > > > diff --git a/t/test-lib.sh b/t/test-lib.sh > > index 7726d1da88a..220c259e796 100644 > > --- a/t/test-lib.sh > > +++ b/t/test-lib.sh > > @@ -1546,7 +1546,7 @@ case $uname_s in > > test_set_prereq SED_STRIPS_CR > > test_set_prereq GREP_STRIPS_CR > > test_set_prereq WINDOWS > > - GIT_TEST_CMP=mingw_test_cmp > > + GIT_TEST_CMP="test-tool cmp" > > ;; > > *CYGWIN*) > > test_set_prereq POSIXPERM > > > > base-commit: 23b219f8e3f2adfb0441e135f0a880e6124f766c >