On 22.01.12 10:58, Nguyen Thai Ngoc Duy wrote: > On Sun, Jan 22, 2012 at 5:56 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> [Pinging Nguyen who has worked rather extensively on the start-up sequence >> for ideas.] >> [snip] > > I just have a quick look, you reencode opendir, readdir, and > closedir() to precomposed form. But files are still in decomposed > form, does open(<precomposed file>) work when only <decomposed file> > exists? Yes. All function like stat(), lstat(), open(), fopen(), unlink() behave the same for precomped or decomposed. This is similar to the ignore case feature. And because the default HFS+ is case preserving, case insenstive and unicode decomposing all at the same time, a file name "Ä" could be reached under 4 different names. Please see the output of the test script: (which is at the end of this email) tests/Darwin_i386/NFC file name created as nfc is readable as nfd tests/Darwin_i386/NFC readdir returns nfd but expected is nfc tests/Darwin_i386/NFD file name created as nfd is readable as nfc tests/Darwin_i386/NFCNFD 1 file found in directory, but there should be 2 tests/Darwin_i386/NFCNFD nfc is missing, nfd is present tests/Darwin_i386/NFCNFD nfc File content overwritten by nfd tests/Darwin_i386/NFDNFC 1 file found in directory, but there should be 2 tests/Darwin_i386/NFDNFC nfc is missing, nfd is present tests/Darwin_i386/NFDNFC nfd File content overwritten by nfc > >>> In order to prevent that ever a file name in decomposed unicode is >>> entering the index, a "brute force" attempt is taken: all arguments into >>> git (argv[1]..argv[n]) are converted into precomposed unicode. This is >>> done in git.c by calling precompose_argv(). This function is actually a >>> #define, and it is only defined under Mac OS. Nothing is converted on >>> any other platforms. > > This is not entirely safe. Filenames can be taken from a file for > example (--stdin option or similar). Unless I'm mistaken, all file > names must enter git through the index, the conversion at read-cache.c > may be a better option. Good point, thanks. I added some code to read-cache.c, and it works for files, but not for directories. I looked through the code for "case-ignoring" directory names, and couldn't find something obvious. More work is to be done. [snip] > I'd rather encode at index level and read_directory() than at argv[]. >But if reencoding argv is the only feasible way, perhaps put the >conversion in parse_options()? I tried that, and found that git-lsfiles.c doesn't use parse_options. [snip] On the long run I want to get rid of the argv[] conversion completely, but I'm not there yet. Thanks for all comments and inspiration! (and apologies for my long response times I use to have) /Torsten PS: Here the script. Mac OS writes decomposd unicode to HFS+, precomposed unicode to VFAT and SAMBA. In any case readdir() returns decomposed. ================= #!/bin/sh errorandout() { echo Error: The shell can not handle nfd echo try to run /bin/bash $0 rm -rf $DIR exit 1 } checkDirNfcOrNfd() { DDNFCNFD=$1 readdirexp=$2 if test -r $DDNFCNFD/$aumlnfc; then x=`cat $DDNFCNFD/$aumlnfc` if test "$x" = nfd; then echo $DDNFCNFD file name created as nfd is readable as nfc fi fi if test -r $DDNFCNFD/$aumlnfd; then x=`cat $DDNFCNFD/$aumlnfd 2>/dev/null` || { echo $DDNFCNFD nfd is not readable, but readdir says that it exist } if test "$x" = nfc; then echo $DDNFCNFD file name created as nfc is readable as nfd fi fi readdirres=`echo $DDNFCNFD/*` if test "$readdirres" != "$DDNFCNFD/$readdirexp"; then if test "$readdirres" = $DDNFCNFD/$aumlnfd; then echo $DDNFCNFD readdir returns nfd but expected is nfc fi if test "$readdirres" = $DDNFCNFD/$aumlnfc; then echo $DDNFCNFD readdir returns nfc but expected is nfd fi fi } checkdirnfcnfd() { DDNFCNFD=$1 if test `ls -1 $DDNFCNFD | wc -l` != 2; then if test `ls -1 $DDNFCNFD | wc -l` == 1; then echo $DDNFCNFD 1 file found in directory, but there should be 2 else echo $DDNFCNFD 2 files should be in directory fi fi x=`echo $DDNFCNFD/*` a=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfc` b=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfd` c=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfc` d=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfd` e=`echo $DDNFCNFD/$aumlnfc` f=`echo $DDNFCNFD/$aumlnfd` case "$x" in $a) ;; $b) ;; $c) echo $DDNFCNFD nfd is hidden, nfc is listed twice ;; $d) echo $DDNFCNFD nfc is hidden, nfd is listed twice ;; $e) echo $DDNFCNFD nfd is missing, nfc is present ;; $f) echo $DDNFCNFD nfc is missing, nfd is present ;; *) echo $DDNFCNFD x`echo $x | xxd` ;; esac if ! test -r $DDNFCNFD/$aumlnfc; then echo $DDNFCNFD/nfc File does not exist else x=`cat $DDNFCNFD/$aumlnfc` if test "$x" != nfc; then echo $DDNFCNFD nfc File content overwritten by $x fi fi if ! test -r $DDNFCNFD/$aumlnfd; then echo $DDNFCNFD/nfd File does not exist else x=`cat $DDNFCNFD/$aumlnfd` if test "$x" != nfd; then echo $DDNFCNFD nfd File content overwritten by $x fi fi } aumlnfc=$(printf '\303\204') aumlnfd=$(printf '\101\314\210') DIR=tests/`uname -s`_`uname -m` echo "DIR=$DIR" rm -rf $DIR/NFC && rm -rf $DIR/NFD && rm -rf $DIR/NFCNFD && rm -rf $DIR/NFDNFC && mkdir -p $DIR/NFC && mkdir -p $DIR/NFD && mkdir -p $DIR/NFDNFC && mkdir -p $DIR/NFCNFD && echo nfc > $DIR/NFC/$aumlnfc && echo nfd > $DIR/NFD/$aumlnfd && echo nfd > $DIR/NFDNFC/$aumlnfd && echo nfc > $DIR/NFDNFC/$aumlnfc && echo nfc > $DIR/NFCNFD/$aumlnfc && echo nfd > $DIR/NFCNFD/$aumlnfd && { # test 1: basic if the shell handles nfd if ! test -r $DIR/NFD/$aumlnfd; then errorandout fi for DD in tests/*; do checkDirNfcOrNfd $DD/NFC $aumlnfc checkDirNfcOrNfd $DD/NFD $aumlnfd checkdirnfcnfd $DD/NFCNFD checkdirnfcnfd $DD/NFDNFC done } || errorandout -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html