Re: [PATCH 02/23] t7527: test FS event reporing on macOS WRT case and Unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/24/22 12:33 PM, Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= wrote:
On Thu, Feb 24, 2022 at 09:52:28AM -0500, Derrick Stolee wrote:
On 2/15/2022 10:59 AM, Jeff Hostetler via GitGitGadget wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Confirm that macOS FS events are reported with a normalized spelling.

APFS (and/or HFS+) is case-insensitive.  This means that case-independent

This is not true, strictly speaking.
You can format a disk with "case sensitive" or "case-insenstive, case preserving".
Both APFS and HFS+  can be formated that way.
The default, which is what you get when you get a new machine,
is "case-insenstive, case preserving".
And I assume, that more 99% of all disks are formated that way.
The "core.ignorecase" is used in the same way as it is used under NTFS,
FAT or all other case-insenstive file systems.
(and even ext4 can be formated case-insensitive these days.)

An interesting article can be found here:
https://lwn.net/Articles/784041/

And to be technically correct, I think that even NTFS can be
"configured to be case insensitive in an empty directory".

In that sense, I would like to avoid this statement, which
file system is case insensitive, and which is not.
Git assumes that after probing the FS in "git init" we have
a valid configuration in core.ignorecase.

You're right. I was incorrectly glossing over the differences
between APFS and HFS+ -- and conflating case and nfc/nfd
issues.

[...]

NEEDSWORK: I was only able to test case.  It would be nice to add tests

"I was only able test the APFS case."?

I'm going to completely redo this commit in the next version.
I now have both APFS and HFS+ partitions on my machine and
can compare the differences in behaviors and will have a new
set of tests to cover this.


that use different Unicode spellings/normalizations and understand the
differences between APFS and HFS+ in this area.  We should confirm that
the spelling of the workdir paths that the daemon sends to clients are
always properly normalized.

Are there any macOS experts out there who can help us find the answers
to these questions?

There is a difference between HFS+ and APFS.
HFS+  is "unicode decomposing" when you call readdir() - file names
are stored decomposed on disk once created.
However, opening  file in precompsed form succeds.
In that sense I would strongly suspect, that any monitors are "sending"
the decomposed form (on HFS+).

APFS does not manipulate file names in that way, it is
"unicode normalization preserving and ignoring".

It took a few hours of poking to figure out what Apple is doing,
but yes on HFS+ they convert to NFD and use that as the on-disk
format.  And they do collision detection as they always have in
NFD-space.

Whereas on APFS, they preserve the NFC/NFD as given when the file
is created, but always do the same collision detection in NFD-space.
The net result is similar, but subtlety different.

FS Events from MacOS are sent using the on-disk format (NFD on HFS+
and whichever on APFS) and my FSMonitor daemon is sending them to
the client as received.

I'm not sure whether or not the daemon should respect the
`core.precompseUnicode` setting and when watching an HFS+
volume do the NFD-->NFC conversion for the client.  I'm not
sure whether that would be any more or less correct than just
reporting the paths as received.  I'm going to leave this as a
question for the future.


Thanks for all of your background information on this topic.
Jeff





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux