Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode

Derrick Stolee <stolee@xxxxxxxxx> · Wed, 11 Dec 2019 21:51:10 -0500

On 12/11/2019 4:37 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@xxxxxxxxx> writes:
> 
>> The specific example I have in my mind is that the filesystem can have
>> "README.TXT" as what it would report by readdir(), but a user can type
>>
>> 	git add readme.txt
>>
>> Without core.ignoreCase, the index will store the path as "readme.txt",
>> possibly adding a new entry in the index next to "README.TXT". If
>> core.ignoreCase is enabled, then the case reported by readdir() is used.
>> (This works both for a tracked and untracked path.)
> 
> Yes, but which one is "correct"?  readme.txt read from the index or
> README.TXT read from the filesystem?  Presumably, when the paths got
> first checked out of the index, it would have been in "readme.txt"
> on the filesystem, so the user must have done on purpose something
> to cause the file to be named in all uppercase since then.

Ah. The issue here is that with core.ignoreCase=true, the index and
filesystem agree. It's just the user input that differs.

>> This is a case that is difficult to control for. We have no source
>> of truth (filesystem or index) at the time of the 'git sparse-checkout
>> set' command. I cannot think of a solution to this specific issue
>> without doing the more-costly approach on every unpack_trees() call.
>>
>> I believe this case may be narrow enough to "document and don't-fix",
>> but please speak up if anyone thinks this _must_ be fixed.
> 
> I do not think it "must be" fixed in the sense that "if it hurts,
> don't do so" is a sufficient remedy.  But then for exactly the same
> reason, I do not think it is worth doing what you do in this patch.
> 
> On the other hand, I think runtime case folding, just like what we
> do when "git add" is told to handle a path in different case, would
> be the only "right" way to match end-user expectations on a case
> insensitive system, even though that is a "nice to do so" and
> certainly not a "must do so" thing.
> 
>> The "git add" behavior made me think there was sufficient precedent
>> in Git to try this change.
> 
> Since I view that the "git add" behaviour is a "nice to do so"
> runtime conversion, I would actually think the approach you
> discarded would be more in line with the precedent.
> 
>> I'm happy to follow the community's opinion in this matter, including
>> a hard "we will not correct for case in this feature" or "if you want
>> this then you'll pay for it in performance." In the latter case I'd
>> prefer a new config option to toggle the sparse-checkout case
>> insensitivity. That way users could have core.ignoreCase behavior without
>> the perf hit if they use clean input to the sparse-checkout feature.
> 
> I value correctness more---a system that does incorrect thing
> quickly is not all that interesting.
> 
> Assuming that your users are sensible and feed good data, how much
> "performance hit" are we talking about?

I'll need to build a prototype to test the performance hit. Maybe
I'm overestimating the effect of using a case-insensitive hash.

> Wouldn't this be something
> we can make into a "if you have the paths in the correct case, we'll
> see the match just as quickly as in the case sensitive system, so
> most of the time there is no penalty, but for correctness we would
> also fall back to try different cases"?

I think we would want the config option present to say "my data may
not be in the proper case, please do extra work for me". I can't
think of a way to do the fallback in real-time.

I'll try again with the case-insensitive hash algorithm and see
how that shakes out.

Thanks,
-Stolee