Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 1 Jan 2016 17:24:13 +0100

On 01/01/2016 03:58 PM, Johannes Schindelin wrote:
> Hi Michael,
> 
> On Tue, 29 Dec 2015, Michael Haggerty wrote:
> 
>> On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
>>> Git has a config variable to indicate that it is operating on a file
>>> system that is case-insensitive: core.ignoreCase. But the
>>> `dir_inside_of()` function did not respect that. As a result, if Git's
>>> idea of the current working directory disagreed in its upper/lower case
>>> with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
>>> user would be greeted by the error message
>>>
>>> 	fatal: git-am cannot be used without a working tree.
>>>
>>> when trying to run a rebase.
>>>
>>> This fixes https://github.com/git-for-windows/git/issues/402 (reported by
>>> Daniel Harding).
>>
>> I was just going through the 2.7 release notes when I saw this patch.
> 
> Thanks for your diligence!
> 
>> My understanding was that many of the case-insensitive filesystems also
>> support Unicode. Is the byte-oriented code in this patch adequate? I
>> would have thought it necessary to use a Unicode-aware algorithm here,
>> that knows:
>>
>> * that bytes != characters
> 
> I am not sure that we can in general assume that the file name is UTF-8...
> Or does Git always assume that?

No, it does not. I would say that Git resolutely refuses to take a stand
on whether/what encoding is used for filenames within its object database.

But as far as I understand, here we have a slightly different situation:
we have a string from an environment variable, and we have the current
working directory. Neither of these things are paths from the Git ODB,
and we don't have to take a stand about whether two paths from the ODB
are equivalent.

>> * how to do a case-insensitive comparison of strings that include
>> non-ASCII characters
> 
> I was worrying about that, too, but decided to punt on it when I realized
> that no other case-insensitive code in Git bothers about those characters.
> 
>> * (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms
> 
> Whoa... I really would like to stay away from that collection of
> potholes...
> 
>> I suppose that such OSs have built-in functions for deciding whether two
>> paths are equivalent. Possibly these could be used?
> 
> We could, in theory, try to do that, but what about the OSes that do *not*
> have those functions? We would need our own fallback anyway, so why not
> guarantee consistency and use our own functions only?

I don't see why consistency across OSes is an issue here. Whenever we
are talking about paths on a particular filesystem, it is the
filesystem's idea of equivalence that is relevant.

Suppose we had a wrapper `filesystem_paths_equivalent(path1, path2)`. If
an OS provides a function that can be used for such a comparison, we can
use that. If not, but `ignore_case` is set, we can use your
`cmp_icase()`-based function. Otherwise, the default implementation
would be `!strcmp()`.

I don't think it would be worth going to all this trouble for this one
callsite (at least under my assumption that this has no security
implications). But I would have expected this issue to come up in enough
places that such an abstraction would be helpful.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html