[RFC][PATCH v2] git on Mac OS and precomposed unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Changes since last version:
- Improved testcase t/t3910-mac-os-precompose.sh:
  test "git commit -- pathspec" (Thanks Junio)
- Improved the converting of argv[] for "git commit"

===============
Purpose:
This patch is a suggestion to work around the unpleasenties
when Mac OS is decomposing unicode filenames.

The suggested change:
a) is only used under Mac OS
b) can be switched off by a configuration variable
c) is optimized to handle ASCII only filename
d) will improve the interwork between Mac OS, Linux and Windows*
   via git push/pull, using USB sticks (technically speaking VFAT)
   or mounted network shares using samba.

* (Not all Windows versions support UTF-8 yet:
   Msysgit needs the unicode branch, cygwin supports UTF-8 since 1.7)


Runtime configuration:
A new confguration variable is added: "core.precomposedunicode"
This variable is only used on Mac OS.
If set to false, git behaves exactly as older versions of git.
When a new git version is installed and there is a repository
where the configuration "core.precomposedunicode" is not present,
the new git is backward compatible.

When core.precomposedunicode=true, all filenames are stored in precomposed
unicode in the index (technically speaking precomposed UTF-8).
Even when readdir() under Mac OS returns filenames as decomposed.

Implementation:
Two files are added to the "compat" directory, darwin.h and darwin.c.
They implement basically 3 new functions:
darwin_opendir(), darwin_readdir() and darwin_closedir().


Compile time configuration:
A new compiler option PRECOMPOSED_UNICODE is introduced in the Makefile,
so that the patch can be switched off completely at compile time.

No decomposed file names in a git repository:
In order to prevent that ever a file name in decomposed unicode is entering
the index, a "brute force" attempt is taken:
all arguments into git (technically argv[1]..argv[n]) are converted into
precomposed unicode.
This is done in git.c by calling argv_precompose() for all commands:
For "git commit" all args after "--" are converted,
for all other commands all argv[] is converted.

This function is actually a #define, and it is only defined under Mac OS.
Nothing is converted on any other OS.

Implementation details:
The main work is done in darwin_readdir() and argv_precompose().
The conversion into precomposed unicode is done by using iconv,
where decomposed is denoted by "UTF-8-MAC" and precomposed is "UTF-8".
When already precomposed unicode is precomposed, the string is returned
unchanged.

Thread save:
Since there is no need for argv_precompose()to be thread-save, one iconv
instance is created at the beginning and kept for all conversions.
Even readdir() is not thread-save, so that darwin_opendir() will call
iconv_open() once and keep the instance for all calls of darwin_readdir()
until darwin_close() is called.

Auto sensing:
When creating a new git repository with "git init" or "git clone", the
"core.precomposedunicode" will be set automatically to "true" or "false".

Typically core.precomposedunicode is "true" on HFS and VFAT.
It is even true for file systems mounted via SAMBA onto a Linux box,
and "false" for drives mounted via NFS onto a Linux box.


New test case:
The new t3910-mac-os-precompose.sh is added to check if a filename
can be reached either in precomposed or decomposed unicode (NFC or NFD).


Torsten Bögershausen (1):
  git on Mac OS and precomposed unicode

 Documentation/config.txt     |    9 ++
 Makefile                     |    3 +
 builtin/init-db.c            |   22 +++++
 compat/darwin.c              |  208 ++++++++++++++++++++++++++++++++++++++++++
 compat/darwin.h              |   31 ++++++
 git-compat-util.h            |    8 ++
 git.c                        |    1 +
 t/t0050-filesystem.sh        |    1 +
 t/t3910-mac-os-precompose.sh |  117 +++++++++++++++++++++++
 9 files changed, 400 insertions(+), 0 deletions(-)
 create mode 100644 compat/darwin.c
 create mode 100644 compat/darwin.h
 create mode 100755 t/t3910-mac-os-precompose.sh

-- 
1.7.8.rc0.43.gb49a8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]