Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 4, 2010 at 6:02 PM, Robin Rosenberg
<robin.rosenberg@xxxxxxxxxx> wrote:
> söndagen den 3 oktober 2010 11.56.44 skrev  Ævar Arnfjörð Bjarmason:
>> From: Joshua Jensen <jjensen@xxxxxxxxxxxxxxxxx>
>>
>> When mydir/filea.txt is added, mydir/ is renamed to MyDir/, and
>> MyDir/fileb.txt is added, running git ls-files mydir only shows
>> mydir/filea.txt. Running git ls-files MyDir shows MyDir/fileb.txt.
>> Running git ls-files mYdIR shows nothing.
>>
>> With this patch running git ls-files for mydir, MyDir, and mYdIR shows
>> mydir/filea.txt and MyDir/fileb.txt.
>>
>> Wildcards are not handled case insensitively in this patch. Example:
>> MyDir/aBc/file.txt is added. git ls-files MyDir/a* works fine, but git
>> ls-files mydir/a* does not.
>>
>> Signed-off-by: Joshua Jensen <jjensen@xxxxxxxxxxxxxxxxx>
>> Signed-off-by: Johannes Sixt <j6t@xxxxxxxx>
>> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
>> ---
>>  dir.c |   38 ++++++++++++++++++++++++++------------
>>  1 files changed, 26 insertions(+), 12 deletions(-)
>>
>> diff --git a/dir.c b/dir.c
>> index cf8f65c..53aa4f3 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -107,16 +107,30 @@ static int match_one(const char *match, const char
>> *name, int namelen) if (!*match)
>>               return MATCHED_RECURSIVELY;
>>
>> -     for (;;) {
>> -             unsigned char c1 = *match;
>> -             unsigned char c2 = *name;
>> -             if (c1 == '\0' || is_glob_special(c1))
>> -                     break;
>> -             if (c1 != c2)
>> -                     return 0;
>> -             match++;
>> -             name++;
>> -             namelen--;
>> +     if (ignore_case) {
>> +             for (;;) {
>> +                     unsigned char c1 = tolower(*match);
>> +                     unsigned char c2 = tolower(*name);
>
> Is anyone thinking "unicode" around here?
>

You're not the first to think about the combination of core.ignorecase
and unicode, but unfortunately way too few people have.

slow_same_name() (and index_name_exists() by proxy) already does the
Wrong Thing (tm), so the problem is already rooted in the index. The
consensus on the msysGit mailing list last time this was brought up
[1] was simply to ignore the combination of unicode and
core.ignorecase, but I'm not sure I'm convinced myself that it's a
good idea. We might end up painting our selves further into a corner,
in the end making it nearly impossible to fix.

One complicating factor is that Windows' definition of what
character-pairs compare as identical depends on a table stored
somewhere in NTFS[2]. The time your drive was formatted decides what
that table looks like, and I haven't been able to retrieve it. This
might be going a little too far, as this table is likely to be very
rarely changed, but I think it's worth noting.

[1]: http://groups.google.com/group/msysgit/browse_thread/thread/675ad16102f6233f/a25cd7bb8dfa2abb#a25cd7bb8dfa2abb
[2]: http://blogs.msdn.com/b/michkap/archive/2007/10/24/5641619.aspx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]