Peter Karlsson wrote: > Linus Torvalds wrote: > > > The difference I see between us is that when I tell you that this is > > exactly the same thing as your file *contents*, > > This is the same issue as the CRLF issue I posted on earlier, and it > all stems from that git also sees file names as a stream of bytes, not > a string of characters, just as it does text. You have to be careful about CRLF conversion, lest you corrupt your binary files. CRLF conversion is off by default. > > An OS that silently changes the contents of your files is *crap*. > > Get it? > > A program that silently ignores the conventions of the platform it runs > on is *crap*, no matter if the conventions are not the same as for > other platforms. > > > An OS that silently changes the contents of your directories is *crap*. > > Get it now? > > A program that silently ignores the conventions of the file system it > tries to store its files on is *crap* :-) Git philosophy to see the contents of files and "contents" of directories (filenames) as stream of bytes, i.e. to use 'native' encoding works perfectly well and _fast_ if all developers work in the same environment. Troubles start if you are working across operating systems, and across filesystems. > In my perfect world, file names would be stored as a string of characters, > so if I save a file with an å in it, that å would be preserved no > matter if I run Linux on ext2 with my locale is set to latin-1 (which > stores it as byte 0xE5), on Windows with NTFS (which stores it as the > UTF-16 code 0x00E5), on Windows/DOS with FAT (which stores it as the > byte 0x86) or on Mac OS X which stores it as decomposed UTF-8 (whose > byte sequence I don't know at the top of my head). If that was just > stored as U+00E5 in whatever encoding in the filename index, the local > implementation of git can just check it out in the form needed. Git has for a long time i18n.commitEncoding, and from some time it saves it in 'encoding' header in commit object (if different from 'uft-8') and has also i18n.logOutputEncoding. For dealing with different filesystem encodings you would also have to have both: encoding used in 'tree' objects (by repository) for filenames saved somewhere in repository, either in tree object (argh!) or in some kind of .gitconfig file; encoding used by filesystem in repository config as i18n.filesystemEncoding or something like that. And think what to put in the on disk index, and in memory index. NOTE, NOTE, NOTE! If filename is used somewherein the file contents (manifest-like file, include-like statement), and this filename uses characters which are differently encoded in different encoding you are screwed with this fancy system, badly, anyway. -- Jakub Narebski Poland - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html