Re: [PATCH V4] git on Mac OS and precomposed unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.01.12 10:58, Nguyen Thai Ngoc Duy wrote:
> On Sun, Jan 22, 2012 at 5:56 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> [Pinging Nguyen who has worked rather extensively on the start-up sequence
>> for ideas.]
>>
[snip]
> 
> I just have a quick look, you reencode opendir, readdir, and
> closedir() to precomposed form. But files are still in decomposed
> form, does open(<precomposed file>) work when only <decomposed file>
> exists?

Yes. All function like stat(), lstat(), open(), fopen(), unlink() behave the same
for precomped or decomposed. This is similar to the ignore case feature.
And because the default HFS+ is case preserving, case insenstive and unicode decomposing
all at the same time, a file name "Ä" could be reached under 4 different names.
Please see the output of the test script:
(which is at the end of this email)

tests/Darwin_i386/NFC file name created as nfc is readable as nfd
tests/Darwin_i386/NFC readdir returns nfd but expected is nfc
tests/Darwin_i386/NFD file name created as nfd is readable as nfc
tests/Darwin_i386/NFCNFD 1 file found in directory, but there should be 2
tests/Darwin_i386/NFCNFD nfc is missing, nfd is present
tests/Darwin_i386/NFCNFD nfc File content overwritten by nfd
tests/Darwin_i386/NFDNFC 1 file found in directory, but there should be 2
tests/Darwin_i386/NFDNFC nfc is missing, nfd is present
tests/Darwin_i386/NFDNFC nfd File content overwritten by nfc


> 
>>> In order to prevent that ever a file name in decomposed unicode is
>>> entering the index, a "brute force" attempt is taken: all arguments into
>>> git (argv[1]..argv[n]) are converted into precomposed unicode.  This is
>>> done in git.c by calling precompose_argv().  This function is actually a
>>> #define, and it is only defined under Mac OS.  Nothing is converted on
>>> any other platforms.
> 
> This is not entirely safe. Filenames can be taken from a file for
> example (--stdin option or similar). Unless I'm mistaken, all file
> names must enter git through the index, the conversion at read-cache.c
> may be a better option.
Good point, thanks. 
I added some code to read-cache.c, and it works for files, but not for directories.
I looked through the code for "case-ignoring" directory names, and couldn't
find something obvious. More work is to be done.
 

[snip]
> I'd rather encode at index level and read_directory() than at argv[].
>But if reencoding argv is the only feasible way, perhaps put the
>conversion in parse_options()?

I tried that, and found that git-lsfiles.c doesn't use parse_options.

[snip]

On the long run I want to get rid of the argv[] conversion completely,
but I'm not there yet.

Thanks for all comments and inspiration!

(and apologies for my long response times I use to have)
/Torsten



PS:
Here the script.
Mac OS writes decomposd unicode to HFS+, precomposed unicode to VFAT and SAMBA.
In any case readdir() returns decomposed.

=================
#!/bin/sh
errorandout() {
  echo Error: The shell can not handle nfd
  echo try to run /bin/bash $0
  rm -rf $DIR
  exit 1
}

checkDirNfcOrNfd() {
  DDNFCNFD=$1
  readdirexp=$2
  if test -r $DDNFCNFD/$aumlnfc; then
    x=`cat $DDNFCNFD/$aumlnfc`
    if test "$x" = nfd; then
      echo $DDNFCNFD file name created as nfd is readable as nfc
    fi
  fi
  if test -r $DDNFCNFD/$aumlnfd; then
    x=`cat $DDNFCNFD/$aumlnfd 2>/dev/null` || {
      echo $DDNFCNFD nfd is not readable, but readdir says that it exist
    }
    if test "$x" = nfc; then
      echo $DDNFCNFD file name created as nfc is readable as nfd
    fi
  fi
  readdirres=`echo $DDNFCNFD/*`
  if test "$readdirres" != "$DDNFCNFD/$readdirexp"; then
    if test "$readdirres" = $DDNFCNFD/$aumlnfd; then
      echo $DDNFCNFD readdir returns nfd but expected is nfc
    fi
    if test "$readdirres" = $DDNFCNFD/$aumlnfc; then
      echo $DDNFCNFD readdir returns nfc but expected is nfd
    fi
  fi
}

checkdirnfcnfd() {
  DDNFCNFD=$1
  if test `ls -1 $DDNFCNFD | wc -l` != 2; then
    if test `ls -1 $DDNFCNFD | wc -l` == 1; then
      echo $DDNFCNFD 1 file found in directory, but there should be 2
    else
      echo $DDNFCNFD 2 files should be in directory
    fi  
  fi

  x=`echo $DDNFCNFD/*`
  a=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfc`
  b=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfd`
  c=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfc`
  d=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfd`
  e=`echo $DDNFCNFD/$aumlnfc`
  f=`echo $DDNFCNFD/$aumlnfd`
  case "$x" in
    $a)
    ;;      
    $b)
    ;;
    $c)
    echo $DDNFCNFD nfd is hidden, nfc is listed twice
    ;;
    $d)
    echo $DDNFCNFD nfc is hidden, nfd is listed twice
    ;;
    $e)
    echo $DDNFCNFD nfd is missing, nfc is present
    ;;      
    $f)
    echo $DDNFCNFD nfc is missing, nfd is present
    ;;      
    *)
    echo $DDNFCNFD x`echo $x | xxd`
    ;;
  esac

  if ! test -r $DDNFCNFD/$aumlnfc; then
    echo $DDNFCNFD/nfc File does not exist
  else
    x=`cat $DDNFCNFD/$aumlnfc`
    if test "$x" != nfc; then
      echo $DDNFCNFD nfc File content overwritten by $x
    fi
  fi
  
  if ! test -r $DDNFCNFD/$aumlnfd; then
    echo $DDNFCNFD/nfd File does not exist
  else
    x=`cat $DDNFCNFD/$aumlnfd`
    if test "$x" != nfd; then
      echo $DDNFCNFD nfd File content overwritten by $x
    fi
  fi
}


aumlnfc=$(printf '\303\204')
aumlnfd=$(printf '\101\314\210')

DIR=tests/`uname -s`_`uname -m`
echo "DIR=$DIR"

rm -rf $DIR/NFC &&
rm -rf $DIR/NFD &&
rm -rf $DIR/NFCNFD &&
rm -rf $DIR/NFDNFC &&
mkdir -p $DIR/NFC &&
mkdir -p $DIR/NFD &&
mkdir -p $DIR/NFDNFC &&
mkdir -p $DIR/NFCNFD &&
echo nfc > $DIR/NFC/$aumlnfc &&
echo nfd > $DIR/NFD/$aumlnfd &&
echo nfd > $DIR/NFDNFC/$aumlnfd &&
echo nfc > $DIR/NFDNFC/$aumlnfc &&
echo nfc > $DIR/NFCNFD/$aumlnfc &&
echo nfd > $DIR/NFCNFD/$aumlnfd && {
    # test 1: basic if the shell handles nfd
    if ! test -r $DIR/NFD/$aumlnfd; then
      errorandout
    fi

  for DD in tests/*; do
    checkDirNfcOrNfd $DD/NFC  $aumlnfc
    checkDirNfcOrNfd $DD/NFD  $aumlnfd

    checkdirnfcnfd $DD/NFCNFD
    checkdirnfcnfd $DD/NFDNFC
  done
} || errorandout

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]