Re: non-ascii filenames issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



John Tapsell:

Unfortunately not, because for some absolutely crazy reason, there is no way at all to tell what encoding the string is in. It never occured to anyone that it might actually be useful to be able to read the filename in an unambiguous way.

It comes from the Unix tradition, unfortunately, that file names are just a stream of bytes, instead of a stream of characters mapped to a byte sequence. The "stream of bytes" think worked back when everyone used ASCII, but as soon as other character encodings were used (i.e back in the 1970s or so), that assumption broke.

The result is this sort of mess. Just wait until you try to checkout that file on a new filesystem with a different encoding. Or try to checkout that file in Windows. It's like git decided to step backwards 30 years.

Since most people on Linux nowadays probably are running in a UTF-8-based locale, I tried introducing some (very incomplete) patches for the Windows port to make this assumption, to allow Windows users to make use of non-ASCII file names (Windows uses Unicode strings for file names). Mac OS uses (semi-decomposed) UTF-8 strings, so it should also be able to make use of this.

Unfortunately, there seems to be quite some resistance towards deciding on a platform- and language-independent way of storing file names in Git, but rather just going the "Unix" way and making it someone elses problem. I find this a bit sad.


--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]