Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 5 Jan 2018 17:41:31 +0100 (STD)

Hi Ævar,

I spent the last 3.5 hours chasing down a bug in your patch series that
seems to be Windows specific, so please forgive me if the following might
leak a bit of my frustration into the open...

On Thu, 4 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

>  t/t3070-wildmatch.sh | 301 +++++++++++++++++++++++++++++++++++++++++++++++++--

Wow, this file is ugly. Sorry, that was an outburst, you did not deserve
that, but I have to say that it is *really* hard to wrap your head around
lines like this:

	wildtest 1 1 1 1 'foo*' 'foo\*'

What are those 1s supposed to mean? Granted, for you, the author of this
line, it is easy.

The point, though, of regression tests is not only to catch regressions
but also to make it easy to fix them. Not only for the test script's
author, but for others, too (and your script is not even the best example
we have for a script that needs hours to study before one can even hope to
begin to see what is going wrong... I am looking at you, t0027, yes, you.
You know why).

So then I go and see that the `wildtest` function has magic to handle both
6 and 10 parameters. And those parameters get assigned to variable names
as intuitive as `match_f_w_pathmatchi`...

The next thing about the test script is this: calling it with -x is pretty
much useless, because *so much* is happening outside of
test_expect_success clauses (and is therefore *not* traced).

Of course I can debug this, but can't this be easier? And this is not even
a regression after a couple of years, this is fresh from the start...

So here is the first bummer about your rather extensive test (which I
think tests the same behavior over and over and over again, just with
slight variations which however do not matter all that much... for
example, it should be enough to verify *without* filenames that the
globbing of wildmatch() works, and then a single test *with* a filename
should suffice to verify that the connection between globbing and files
works): it requires filenames that are illegal on Windows. Stars, question
marks: forbidden.

Worse, since we have to use Unix shell scripts to perform our
"platform-independent" tests, we have to rely on a Unix shell interpreter,
and Git for Windows uses MSYS2's Bash, which means that we inherit most of
Cygwin's behavior.

Now, Cygwin wants to be cute and allow those illegal filenames because (as
is demonstrated here quite nicely) Unix programmers don't give one bit of
a flying fish about portable filesystem requirements.

So Cygwin maps those illegal characters into a private Unicode page. Which
means that Cygwin can read them alright, but no non-Cygwin program
recognizes the stars and question marks and tabs and newlines. Including
Git.

In short: the Unix shell script t3070 manages to write what it thinks is a
file called 'foo*', but Git only sees 'foo<some-undisplayable-character>'.

I tried to address this problem with this patch:

-- snip --

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index f606f91acbb..51dcb675e7b 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -4,6 +4,14 @@ test_description='wildmatch tests'
 
 . ./test-lib.sh
 
+if test_have_prereq !MINGW && touch -- 'a*b' 2>/dev/null
+then
+	test_set_prereq FILENAMESWITHSTARS
+else
+	say 'Your filesystem does not allow stars in filenames.'
+fi
+rm -f 'a*b'
+
 create_test_file() {
 	file=$1
 
@@ -28,6 +36,17 @@ create_test_file() {
 	*/)
 		return 1
 		;;
+	# On Windows, stars are not allowed in filenames. Git for Windows'
+	# Bash, however, is based on Cygwin which plays funny names with a
+	# private Unicode page to emulate stars in filenames. Meaning that
+	# the shell script will "succeed" to write the file, but Git will
+	# not see it. Nor any other, regular Windows process.
+	*\**|*\[*)
+		if ! test_have_prereq FILENAMESWITHSTARS
+		then
+			return 1
+		fi
+		;;
 	# On Windows, \ in paths is silently converted to /, which
 	# would result in the "touch" below working, but the test
 	# itself failing. See 6fd1106aa4 ("t3700: Skip a test with
-- snap --

This gets us further. But not by much:

fatal: Invalid path '\[ab]': No such file or directory

You see, any path starting with a backslash *is* an absolute path on
Windows. It is relative to the current drive.

This affects *quite* a few of the test cases you added.

And even if I just comment all of those out, I run into the next problem
where you try to create a file whose name consists of a single space,
which is also illegal on Windows.

These woes demonstrate one problem with the approach of overzealously
testing *everything*. You are kind of performing an integration test of
the wildmatch() functionality, the integration into ls-files, *and* of a
filesystem that behaves like the filesystems you are used to.

All you *should* want to test, though, is the wildmatch() functionality.
And *maybe* one or two tests verifying that this wildmatch() functionality
is actually hooked up into ls-files correctly.

You do not need to test that 'a*' matches the strings 'a', 'ab', 'abc',
'abcd', 'abcde', 'abcdef', etc. *and* filenames identical to those
strings. If we already verified (lightly) that wildmatch() *works*, then
we only have to make sure that ls-files uses wildmatch() correctly.
Everything else is useless expense of innocent electrons and neurons of
other developers.

Another problem with this approach (*extensive* testing of essentially the
same stuff over and over again) is what I mentioned earlier: these
tests are really hard to read, and it is even harder to debug test
failures.

And lastly, this approach of overzealously testing *everything* simply
takes *a lot of time* and also electricity. As a developer of patches who
runs the entire test suite before submitting every iteration of every
patch series, this affects me. Others, too, I guess, and they might just
skip the test suite as a consequence because it takes too long. Don't
sneer, you know that this is happening.

Needless to say, I am not enthused.

I would much rather have a lean and mean test script that tested things
lightly, in more of a unit test fashion: test wildmatch. Just wildmatch.
Not ls-files, no filesystem, no nothing. There is already
t/helper/test-wildmatch, for crying out loud. Just feed it test data
together with expected outcomes, will make things much easier to debug,
faster to execute, and be a lot easier to read and modify/fix.

Whatever you decide, in the current form this cannot go to master, as it
fails royally on Windows.

Ciao,
Dscho