On Mon, Jan 21, 2008 at 11:16:54PM -0800, Linus Torvalds wrote: > > Yes, it will work on OS X, but for all the wrong reasons. It works there > just because of the stupid normalization that OS X does both on filename > input and output, so if we hook into readdir() and munge the name there, > we'll still be able to use the munged name for lstat() and open(). Yes, when I proposed the readdir() wrapper, I meant it to be as OS X specific hack. Just because HFS+ munges names and does that by converting them in the form that is HFS+ specific, we can safely convert then into NFC, as we do not lose more information than it is lost already, and more importantly, AFAIK, everything that a user types on Mac is in NFC, whether they are names in the command line or names in .gitatributes. > However, we'll never be able to test it on a sane Unix system, and it > won't ever be able to handle the case of a filesystem actually being > Latin1 but git being asked to try to transparently convert it to utf-8 in > order to work with others. Yes, but that is a separate issue, which unfortunately is much more difficult to deal with. Basically, there are two approaches -- either to wrap all input/output functions, or to find another point where it is possible to convert names without re-writing too much code in Git. It seems to me that the first approach may requires wrapping too much functions, but looking at the code I am not sure that the second will be much easier. There are many places where a filename in the local encoding will interact with Git internal encoding used by repo. If we spoke about Windows only, I would say that the first approach makes much more sense, because all i/o functions used on Windows are already wrappers over Unicode functions. So, converting UTF-8 <-> UTF-16 makes much more sense than UTF-8 <-> some-local-encoding(*) <-> UTF-16. (*) In fact, two different encodings for the same locale setting -- one for console and the other for non-console programs! > It would be conceptually nicer to do it in "add_file_to_index()" instead. > Ie anything that creates a "struct cache_entry" would do the > conversion. I don't think it is going to work, without changing a lot of code, because filenames entered by user and those that are returned by readdir() are different. Also, .gitignore or .gitattributes files will have filenames in the form that differs from returned by readdir(). Dmitry - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html