Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lars Noschinski <lars-2008-2@xxxxxxxxxxxxxxxxxxxx> writes:
> * Peter Krefting <peter@xxxxxxxxxxxxxxxx> [09-03-03 12:54]:
> > Lars Noschinski:
> > >Changing the filename (on checkout), so that the user sees an Ü regardless of 
> > >his or her locale (instead of an \0xDC, which only resolves to an Ü on 
> > >latin-1) would be an absolutely broken concept here.
> > 
> > Why would it? It is my view as a user on my files that define how file names 
> > are looked upon. If I have three machines, one Linux box using a iso8859-1 
> > locale, an OS X box (where, I would believe, file APIs use UTF-8, someone 
> > please correct me if I'm wrong), and a Windows box (which uses UTF-16 on the 
> > file system layer, but does provide compatibility functions that use char 
> > pointers), and create a file on each of these called "Ü.txt" (which would be 
> > the sequence "DC 2E 74 78 74" on the Linux box, "C3 9C 2E 74 78 74" (or 
> > probably something else since I believe OS X decomposes the string) on the OS X 
> > box and "00DC 002E 0074 0078 0074" on the Windows box, I see these three file 
> > names as equal.
> 
> Because a function in the source code refers to (e.g.) "DC 2E 74 78 74",
> not "C3 9C 2E 74 78 74" nor "00DC 0024 0074 0078 0074". And it does so
> regardless of the locale.

The only actual language I know where I've seen people use non-ascii names for
referenced files, i.e. classes, is Java and there you specify the encoding to
the compiler. Class names are not byte sequences there. XML files are another
case where references files are defined in unicode. I assume this applies to
C# and other modern languages too.

> The file name may look funny depending on your locale, but if you rename
> the file to fit your local enconding, it would not work.

In the Java case, you /have/ to "rename" or the build will break. Build systems like Ant
or Maven require you to "rename" too regardless of what you build. A C Git clone
will produce unbuildable code, but JGit will produce a working one for unicode
aware systems and documentation, the case where unicode filenames are more common
than in source, will look good.

-- robin

PS. I readded the people you forgot to Cc
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux