Re: [PATCH V4] git on Mac OS and precomposed unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 22, 2012 at 5:56 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> [Pinging Nguyen who has worked rather extensively on the start-up sequence
> for ideas.]
>
> Torsten Bögershausen <tboegi@xxxxxx> writes:
>
> I'll try to reword the log message a bit below.
>
>> When a file called "LATIN CAPITAL LETTER A WITH DIAERESIS" (in utf-8
>> encoded as 0xc3 0x84) is created, the Mac OS filesystem converts
>> "precomposed unicode" into "decomposed unicode". readdir() will return
>> 0x41 0xcc 0x88 for such a file, that does not match what the caller
>> thought it created.
>>
>> To work around this braindamage, allow git on Mac OS to optionally use a
>> wrapper for readdir() that converts decomposed unicode back into the
>> precomposed form, which most other platforms use natively. This makes it
>> easier for Mac OS users to work together on the same project with people
>> on other platforms (Note that not all Windows versions support UTF-8
>> yet. Msysgit needs the unicode branch, cygwin supports UTF-8 since
>> 1.7). This allows sharing git repositories stored on a VFAT file system
>> (e.g. a USB stick), and mounted network share using samba.

I just have a quick look, you reencode opendir, readdir, and
closedir() to precomposed form. But files are still in decomposed
form, does open(<precomposed file>) work when only <decomposed file>
exists?

>> In order to prevent that ever a file name in decomposed unicode is
>> entering the index, a "brute force" attempt is taken: all arguments into
>> git (argv[1]..argv[n]) are converted into precomposed unicode.  This is
>> done in git.c by calling precompose_argv().  This function is actually a
>> #define, and it is only defined under Mac OS.  Nothing is converted on
>> any other platforms.

This is not entirely safe. Filenames can be taken from a file for
example (--stdin option or similar). Unless I'm mistaken, all file
names must enter git through the index, the conversion at read-cache.c
may be a better option.

>> Auto sensing:
>> When creating a new git repository with "git init" or "git clone",
>> "core.precomposedunicode" will be set "false".

This is a general comment on init auto detection feature. Perhaps we
should allow detection to be done when reinitializing a repo. Or at
least make an option to auto detect, print out new config values and
user can decide if they want to change current values themselves.

>> +void precompose_argv(int argc, const char **argv)
>> +{
>> +     int i = 0;
>> +     const char *oldarg;
>> +     char *newarg;
>> +     iconv_t ic_precompose;
>> +
>> +     git_config(precomposed_unicode_config, NULL);
>
> As the first thing called after main(), I still doubt this is a safe thing
> to do (Pinging Nguyen who has worked rather extensively on the start-up
> sequence for ideas). This is ifdefed away and will not break things on
> other platforms, which may make it even harder to diagnose breakages.

This can't be worse than current state of pager and alias settings,
which need to be detected even before setup_git_directory* is run.

I'd rather encode at index level and read_directory() than at argv[].
But if reencoding argv is the only feasible way, perhaps put the
conversion in parse_options()? Config reading is usually done by then,
and we can move precomposed config reading to git_default_config. If
some commands do not use parse_options yet, this is a good opportunity
to do so (I'll send pack-objects's parse_options() patch soon).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]