Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:

> But for a UNIX interface layer, the most logical thing would probably be
> to map "open()" and friends not to CreateFileA(), but to
> CreateFileW(utf8_to_utf16(filename)).
> 
> Once you do that, then it sounds like Windows would basically be Unicode,
> and hopefully without any crazy normalization (but presumably all the
> crazy case-insensitivity cannot be fixed ;^).
> 
> So it probably really only depends on whether you choose to use the insane
> 8-bit code page translation or whether you just use a sane and trivial
> UTF8<->UTF16 conversion.
> 
> Anybody know which one cygwin/mingw does?

Cygwin does not yet support doing the smart thing.  At the moment you
can only open() files in the current 8 bit codepage.  There is a patch
floating around to allow using UTF-8, but it was rejected for inclusion
because it was considered too hackish.  Instead work has been ongoing
for some time to replumb the internal representation of Windows
filenames to use UTF-16 instead of plain chars, so that conversion
overhead can be held at a minimum.  In conjuction with dropping Win9x/ME
support this also means the Native APIs like NtCreateFile() can be used
directly, as they are more low level than the Win32 -A and -W functions
and expose more flexibility, such as the ability to implement the
openat() family of functions natively (no pun intended) without
emulation.  These two items (unicode and dropping non-NT windows) are
the big features for 1.7.

Of course since a lot of what Cygwin does is translate paths in
sometimes unobvious and complicated ways, there's a lot of path handling
code to adapt, so it's taking a while.

Incidently, the ridiculously short MAX_PATH of 260 on Windows comes from
the Win32 -A version of the functions.  The -W API and the Native API
can cope with paths of up to 32k wide chars, so a side benefit of this
should be the ability to finally stop running into length limits.  Of
course there's always a catch: when using long filenames with the Win32
-W API or the Native API you can only use absolute paths, so either you
have to live with the 260 limitation for relative paths or you keep
track of the current directory and always do a rel->abs conversion.  Or
better, if you stick to the Native API you can do a directory handle
relative openat-type thing which I suppose starts to sound relatively
sane.  However, there's another catch here: For some time Cygwin has
maintained a separate and private value of CWD behind Windows' back, and
only synced the two when spawning a non-Cygwin binary.  This allows
Windows to happly think the process' CWD is always C:\ or whatever, and
not hold an open handle to the actual CWD.  In turn Cygwin uses this to
allow POSIX filesystem behavior of being able to unlink the current dir,
which some programs or build systems assume they can do but is not
possible in straight Win32.  This is a roundabout way of saying that
going back to actually having to keep a handle to CWD open again in
order to do relative paths might be complicated.

Brian
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux