Re: [Question] Unicode weirdness breaking tests on ZFS?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 17, 2021 at 06:06:13PM +0100, Torsten B??gershausen wrote:
> On Wed, Nov 17, 2021 at 05:12:26PM +0100, Torsten B??gershausen wrote:
> > On Wed, Nov 17, 2021 at 10:17:53AM -0500, Derrick Stolee wrote:
> > > I recently had to pave my Linux machine, so I updated it to Ubuntu
> > > 21.10 and had the choice to start using the ZFS filesystem. I thought,
> > > "Why not?" but now I maybe see why.
> > >
> > > Running the Git test suite at the v2.34.0 tag on my machine results in
> > > these failures:
> > >
> > > t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
> > >   TODO passed:   9-10
> > > t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
> > >   Failed test:  31
> > >   Non-zero exit status: 1
> > > t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
> > >   Failed tests:  1, 4, 6, 8, 11-16
> > >   TODO passed:   23
> > >   Non-zero exit status: 1
> > >
> > > These are all related to the UTF8_NFD_TO_NFC prereq.
> > >
> > > Zooming in on t0050, these tests are marked as "test_expect_failure" due
> > > to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:
> > >
> > >
> > > $test_unicode 'rename (silent unicode normalization)' '
> > > 	git mv "$aumlcdiar" "$auml" &&
> > > 	git commit -m rename
> > > '
> > >
> > > $test_unicode 'merge (silent unicode normalization)' '
> > > 	git reset --hard initial &&
> > > 	git merge topic
> > > '
> > >
> > >
> > > The prereq creates two files using unicode characters that could
> > > collapse to equivalent meanings:
> > >
> > >
> > > test_lazy_prereq UTF8_NFD_TO_NFC '
> > > 	# check whether FS converts nfd unicode to nfc
> > > 	auml=$(printf "\303\244")
> > > 	aumlcdiar=$(printf "\141\314\210")
> > > 	>"$auml" &&
> > > 	test -f "$aumlcdiar"
> > > '
> > >
> > >
> > > What I see in that first test, the 'git mv' does change the
> > > index, but the filesystem thinks the files are the same. This
> > > may mean that our 'git add "$aumlcdiar"' from an earlier test
> > > is providing a non-equivalence in the index, and the 'git mv'
> > > changes the index without causing any issues in the filesystem.
> > >
> > > It reminds me as if we used 'git mv README readme' on a case-
> > > insensitive filesystem. Is this not a similar situation?
> > >
> > > What I'm trying to gather is that maybe this test is flawed?
> > > Or maybe something broke (or never worked?) in how we use
> > > 'git add' to not get the canonical unicode from the filesystem?
> > >
> > > The other tests all have similar interactions with 'git add'.
> > > I'm hoping that these are just test bugs, and not actually a
> > > functionality issue in Git. Yes, it is confusing that we can
> > > change the unicode of a file in the index without the filesystem
> > > understanding the difference, but that is very similar to how
> > > case-insensitive filesystems work and I don't know what else we
> > > would do here.
> > >
> > > These filesystem/unicode things are out of my expertise, so
> > > hopefully someone else has a clearer idea of what is going on.
> > > I'm happy to be a test bed, or even attempt producing patches
> > > to fix the issue once we have that clarity.
> > >
> > > Thanks,
> > > -Stolee
> >
> > Interesting.
> > The tests have always been working on HFS+, then we got
> > APFS (and needed a small fix) and now ZFS.
> >
> > I'll can have a look - just installing in a virtual machine.
>
> So, the virtual machine is up-and-running.
>
> I got 2 messages:
>
> ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
> ok 10 - merge (silent unicode normalization) # TODO known breakage vanished
>
> Do you get the same ?


Now I am even more puzzled.
running t0050 with -x gives this:

 Author: A U Thor <author@xxxxxxxxxxx>
  1 file changed, 0 insertions(+), 0 deletions(-)
   rename "a\314\210" => "\303\244" (100%)
   ok 9 - rename (silent unicode normalization) # TODO known breakage vanished


----------------
When I create a test Git, with one file in ä-decomposed,
and rename into ä-precomposed, Git gives me:

tb@Ubuntu2021:~/ttt$ git mv "$aumlcdiar" "$auml"
fatal: destination exists, source=ä, destination=ä

and in hex form:

tb@Ubuntu2021:~/ttt$ git mv "$aumlcdiar" "$auml" 2>&1 | xxd
00000000: 6661 7461 6c3a 2064 6573 7469 6e61 7469  fatal: destinati
00000010: 6f6e 2065 7869 7374 732c 2073 6f75 7263  on exists, sourc
00000020: 653d 61cc 882c 2064 6573 7469 6e61 7469  e=a.., destinati
00000030: 6f6e 3dc3 a40a                           on=...





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux