On Sat, 19 Jan 2008, Dmitry Potapov wrote: > > Actually, there is, if you care to do something. You can write a wrapper > around readdir(3) that will recodes filenames in Unicode Normal Forms C. If somebody wants to do this, then readdir() isn't the only place, but yes, readdir() is one of the places. I suspect that if we were to just do the "turn into NFC on readdir() on OS X", that might actually be good enough to hide most of the problems. The issue isn't just that OS X mangles the filenames, it's that it picks a particularly *stupid* way to mangle them (the decomposed forms), which means that OS X will actually not just corrupt "odd cases" of Unicode, but will corrupt the obvious and *common* Latin1 translations of Unicode. I don't know if NFC is better for other locales, but I doubt it. Usually people want to do the *composite* forms, not the *de*composed forms. A trivial example of this for some cross-OS issue: - let's say that you have a file "Märchen" on just about *any* other OS than OS X. It could be Latin1 or it could be Unicode, but even if it is Unicode, I can almost guarantee that the 'ä' is going to be the *single* Unicode character U+00e4 (utf-8: "\xc3\xa4", latin1: "\xe4") So from a cross-OS standpoint, that's the *common* representation, and yes, you can create the file that way (I don't know what happens if you actually create it with the Latin1 encoding, but I would not be surprised if OS X notices that it's not a valid UTF sequence and assumes it's Latin1 and converts it to Unicode) - But on OS X, because of Apples *insane* choice of normal form, it will then be turned into "a¨". I doubt *anybody* else does that. If you have to normalize it, NFD is just about the *worst* choice. So yeah, even just re-coding it as NFC on readdir() would at least mean that any OS X git client would be MORE LIKELY to pick the same representation as git clients on other OS's. It wouldn't solve all problems (and it would almost certainly create a few new ones), but it would likely at least increase compatibility between systems. So doing the NFC conversion on readdir() on OS X is probably a good idea, and probably is the simplest way to make it interact better with other OS's. And it's definitely safe on OS X, since OS X _already_ corrupted the name, so we're not losing any information (in contrast, on other systems, doing a NFC conversion would possibly lose encoding detail _and_ might be incorrect simply because they might not use Unicode in the first place). Anybody want to creat a compat layer around "readdir()" that does that NFC conversion on OS X but not elsewhere? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html