Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 21, 2008 at 11:16:54PM -0800, Linus Torvalds wrote:
> 
> Yes, it will work on OS X, but for all the wrong reasons. It works there 
> just because of the stupid normalization that OS X does both on filename 
> input and output, so if we hook into readdir() and munge the name there, 
> we'll still be able to use the munged name for lstat() and open().

Yes, when I proposed the readdir() wrapper, I meant it to be as OS X
specific hack. Just because HFS+ munges names and does that by converting
them in the form that is HFS+ specific, we can safely convert then into
NFC, as we do not lose more information than it is lost already, and
more importantly, AFAIK, everything that a user types on Mac is in NFC,
whether they are names in the command line or names in .gitatributes.

> However, we'll never be able to test it on a sane Unix system, and it 
> won't ever be able to handle the case of a filesystem actually being 
> Latin1 but git being asked to try to transparently convert it to utf-8 in 
> order to work with others.

Yes, but that is a separate issue, which unfortunately is much more
difficult to deal with. Basically, there are two approaches -- either
to wrap all input/output functions, or to find another point where it
is possible to convert names without re-writing too much code in Git.
It seems to me that the first approach may requires wrapping too much
functions, but looking at the code I am not sure that the second will
be much easier. There are many places where a filename in the local
encoding will interact with Git internal encoding used by repo.

If we spoke about Windows only, I would say that the first approach makes
much more sense, because all i/o functions used on Windows are already
wrappers over Unicode functions. So, converting UTF-8 <-> UTF-16 makes
much more sense than UTF-8 <-> some-local-encoding(*) <-> UTF-16.

(*) In fact, two different encodings for the same locale setting -- 
one for console and the other for non-console programs!

> It would be conceptually nicer to do it in "add_file_to_index()" instead. 
> Ie anything that creates a "struct cache_entry" would do the 
> conversion. 

I don't think it is going to work, without changing a lot of code,
because filenames entered by user and those that are returned by
readdir() are different. Also, .gitignore or .gitattributes files will
have filenames in the form that differs from returned by readdir().


Dmitry
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux