[Question] Unicode weirdness breaking tests on ZFS?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I recently had to pave my Linux machine, so I updated it to Ubuntu
21.10 and had the choice to start using the ZFS filesystem. I thought,
"Why not?" but now I maybe see why.

Running the Git test suite at the v2.34.0 tag on my machine results in
these failures:

t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
  TODO passed:   9-10
t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
  Failed test:  31
  Non-zero exit status: 1
t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
  Failed tests:  1, 4, 6, 8, 11-16
  TODO passed:   23
  Non-zero exit status: 1

These are all related to the UTF8_NFD_TO_NFC prereq.

Zooming in on t0050, these tests are marked as "test_expect_failure" due
to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:


$test_unicode 'rename (silent unicode normalization)' '
	git mv "$aumlcdiar" "$auml" &&
	git commit -m rename
'

$test_unicode 'merge (silent unicode normalization)' '
	git reset --hard initial &&
	git merge topic
'


The prereq creates two files using unicode characters that could
collapse to equivalent meanings:


test_lazy_prereq UTF8_NFD_TO_NFC '
	# check whether FS converts nfd unicode to nfc
	auml=$(printf "\303\244")
	aumlcdiar=$(printf "\141\314\210")
	>"$auml" &&
	test -f "$aumlcdiar"
'


What I see in that first test, the 'git mv' does change the
index, but the filesystem thinks the files are the same. This
may mean that our 'git add "$aumlcdiar"' from an earlier test
is providing a non-equivalence in the index, and the 'git mv'
changes the index without causing any issues in the filesystem.

It reminds me as if we used 'git mv README readme' on a case-
insensitive filesystem. Is this not a similar situation?

What I'm trying to gather is that maybe this test is flawed?
Or maybe something broke (or never worked?) in how we use
'git add' to not get the canonical unicode from the filesystem?

The other tests all have similar interactions with 'git add'.
I'm hoping that these are just test bugs, and not actually a
functionality issue in Git. Yes, it is confusing that we can
change the unicode of a file in the index without the filesystem
understanding the difference, but that is very similar to how
case-insensitive filesystems work and I don't know what else we
would do here.

These filesystem/unicode things are out of my expertise, so
hopefully someone else has a clearer idea of what is going on.
I'm happy to be a test bed, or even attempt producing patches
to fix the issue once we have that clarity.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux