On Mon, Oct 4, 2010 at 16:49, Joshua Jensen <jjensen@xxxxxxxxxxxxxxxxx> wrote: >> Is anyone thinking "unicode" around here? > > On Windows, Unicode filenames are 16-bit wide characters. ÂThe current code > doesn't handle them at all. > > I do not know about other file systems and what Git actually handles. ÂI was > under the impression it didn't handle Unicode filenames well in general... ? The only sane way of doing this sort of thing is to have a defined *internal* encoding that gets converted to whatever the native encoding is at the input/output points. So Git could use Unicode represented by UTF-8, UTF-16 (whatever's convenient) internally, but when you check out files those checked out files can be in whatever encoding you choose. So you could have a UTF-8 repository but check out UTF-8 filenames on Windows. I.e. internally we'd have the file: Ãab Represented by UTF-8: c3 a6 61 62 \0 But would check out UTF-16: ff fe e6 00 61 00 62 00 Then when you add a new file it'll know it's in UTF-16 and convert it to UTF-8 before writing to the repository. All invisible to the user. Perl handles encoding issues like this, and it's awesome. The only thing you have to do is make sure that the system knows the encoding of data going into it, and what encoding you want out of it. But any implementation of this is far off, and just storing raw byte streams is Good Enough now that almost everyone uses UTF-8 anyway, so nobody's seriously worked on this. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html