Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Sun, 20 Jan 2008, Dmitry Potapov wrote:

> On Sat, Jan 19, 2008 at 10:58:08PM +0000, Johannes Schindelin wrote:
> > 
> > I think a better approach would be to try to match the name to what we 
> > have in the index.  Then we could implement case-insensitivity and 
> > MacOSX workaround at the same time.
> 
> I thought about that, but the problem is that HFS+ _already_ mangled 
> names from what the user entered (and what is used by anyone else) to 
> some sub-standard form, which no one outside of Mac likes or uses.

So?  That's why I said "match", not "compare for identity".

To be a little bit more precise: I think a viable plan would be to

- have a config switch which determines what type of filename mangling we
  allow the host OS to perform (Unicode "normalisation", case mongering),
  and leave _everybody_ alone who left that switch unset,

- "overload" readdir() (by the famous git_X(); #define X git_X trick),

- have the overloaded readdir() _know_ which is the current prefix, and
  load the index if it has not yet been loaded (but probably into a static
  variable to avoid reloading, and to avoid interfering with the global
  "cache" instance).

It _could_ be wise to store the "normalised" forms at one stage (instead 
of the index) to speed up comparison -- the prefix has to be normalised 
for readdir()s purposes, too, then.

This is possible with the HFS+ problem, since we know exactly how HFS+ 
tries to "help", and for case insensitivity too, I think.  But it may be 
restricting ourselves for other filename "equivalences" we might want to 
handle one day.

BTW: I cannot think of anything else than readdir() which should have the 
"problem" of reading back a name that the user did not specify.  What am I 
missing?

> Thus, bringing filenames back to the NFC form (which is what almost 
> anyone uses) is the only sane thing do, because no one outside of Mac 
> really needs to know about this HFS+ specific craziness.

No.  I think that would be a serious mistake.  If you add a file on MacOSX 
(with a _mangled_ filename, think of "git add ."), git should not try to 
be as clever as HFS+ and "remangle" it.

> So I really dislike the idea that due to some HFS+ specific conversion, 
> we may end up having some strangely encoded names in a Git repository.

It _is_ UTF-8, so what's the problem?

As for the HFS+ specfic conversion: like the CRLF issue, I am opposed to 
have a "solution" affecting other people than those on broken system.  So 
I very much _want_ it to be an HFS+ specific conversion.

> Besides, writing a wrapper around readdir() is not difficult. We already 
> have git-compat-util.h, which redefines some functions for some 
> platforms, so I don't see any problem with writing a wrapper around 
> readdir().

Exactly.

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux