Re: [PATCH v2 00/27] Builtin FSMonitor Part 3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/13/22 6:42 AM, Torsten Bögershausen wrote:
Hej Jeff,

[...]

One other thing, I just add it here:
There is a new file, t/lib-unicode-nfc-nfd.sh, which helps us with this code:
test_lazy_prereq UNICODE_NFC_PRESERVED

The existing code uses a construct called
UTF8_NFD_TO_NFC

And now I have 2 questions:
- Do we need the UNICODE_NFC_PRESERVED at all ?
- And should the UTF8_NFD_TO_NFC better be called UTF8_NFC_TO_NFD,
   because that is what it checks.
- Do we need the UNICODE_NFD_PRESERVED at all ?

As there are no non-UNICODE_NFD_PRESERVED filesystems, as far as I know.
And the current code does no tests, just debug prints.
I dunno.

I created t/lib-unicode-nfc-nfd.sh to help me understand
the issues.  I found the existing UTF8_NFD_TO_NFC prereq
confusing (and yes it seemed poorly named).

The existing prereq returned the same answer on APFS, HFS+,
and FAT32 (a thumbdrive).  I know they behave differently
and I found it odd that the prereq did not make any distinction.

I was hesitant to rename the existing prereq because it is
currently used by 5+ different tests and I didn't want to
expand the scope of my two already very large series.

Also, the existing prereq feels a little sloppy.  It creates
a file in NFC and does a lstat in the NFD spelling.  There
are several ways that the OS and/or FS can lie to us.  For
example, the prereq is satisfied on a FAT32 thumbdrive and
we know FAT32 doesn't do NFC-->NFD conversions.  So I'd like
to move away from that prereq definition at some point.


My new prereqs try to:

(1) independently confirm whether there is aliasing happening
    at all (whether at the FS or OS layer).

(2) determine if the actual on-disk spelling is altered by the
    FS (in both NFC and NFD cases).


We know that HFS+ does not preserve NFC spellings, but APFS
does.  (FAT32 also preserves NFC spelling under MacOS.)
So the UNICODE_NFC_PRESERVED lets me distinguish between HFS+
and APFS/FAT32.

I have not heard of any filesystems that convert NFD to NFC,
so technically we don't need the UNICODE_NFD_PRESERVED prereq,
but then again until I tested that, it was unclear how MacOS
did the aliasing on APFS (and FAT32).  On the basis of that
testing, we can say that MacOS -- at the MacOS layer -- is
responsible for the aliasing and that both NFC and NFD spellings
are preserved on APFS and FAT32.

So I'd rather keep the 3 prereqs that I have now.

The ones marked _DOUBLE_ are currently extra.  I have them to
help study how code points with multiple combining characters
are handled.  I have prereqs for the basic double chars, but
there are several opportunities for weird edge cases (non-
canonical ordering and other collisions) that I don't want to
get stuck on right now.  So we might make more use of them in
the future.


That's too long of an answer, but hopefully that explains
some of my paranoia. :-)

Jeff


On Tue, Mar 08, 2022 at 10:15:00PM +0000, Jeff Hostetler via GitGitGadget wrote:
Here is V2 of part 3 of my builtin FSMonitor series.
[...]



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux