Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 11 Dec 2019 12:00:03 -0800

Derrick Stolee <stolee@xxxxxxxxx> writes:

> I'm trying to find a way around these two ideas:
>
> 1. The index is case-sensitive, and the sparse-checkout patterns are
>    case-sensitive.

OK.  The latter is local to the repository and not shared to the
world where people with case sensitive systems would live, right?

> 2. If a filesystem is case-insensitive, most index-change operations
>    already succeed with incorrect case, especially with core.ignoreCase
>    enabled.

I am not sure about "incorrect", though.  

My understanding is that modern case-insensitive systems are not
information-destroying [*1*].  A new file you create as "ReadMe.txt"
on your filesystem would be seen in that mixed case spelling via
readdir() or equivalent, so adding it to the index as-is would
likely be in the "correct" case, no?  If, after adding that path to
the index, you did "rm ReadMe.txt" and created "README.TXT", surely
we won't have both ReadMe.txt and README.TXT in the index with
ignoreCase set, and keep using ReadMe.txt that matches what you
added to the index.  I am not sure which one is the "incorrect" case
in such a scenario.

> The approach I have is to allow a user to provide a case that does not
> match the index, and then we store the pattern in the sparse-checkout
> that matches the case in the index.

Yes, I understood that from your proposed log message and cover
letter.  They were very clearly written.

But imagine that your user writes ABC in the sparse pattern file,
and there is no abc anything in the index in any case permutations.

When you check out a branch that has Abc, shouldn't the pattern ABC
affect the operation just like a pattern Abc would on a case
insensitive system?

Or are end users perfectly aware that the thing in that branch is
spelled "Abc" and not "ABC" (the data in Git does---it comes from a
tree object that is case sensitive)?  If so, then the pattern ABC
should not affect the subtree whose name is "Abc" even on a case
insensitive system.

I am not sure what the design of this part of the system expects out
of the end user.  Perhaps keeping the patterns case insensitive and
tell the end users to spell them correctly is the right way to go, I
guess, if it is just the filesystem that cannot represente the cases
correctly at times and the users are perfectly capable of telling
the right and wrong cases apart.

But then I am not sure why correcting misspelled names by comparing
them with the index entries is a good idea, either.

> It sounds like you are preferring this second option, despite the
> performance costs. It is possible to use a case-insensitive hashing
> algorithm when in the cone mode, but I'm not sure about how to do
> a similar concept for the normal mode.

I have no strong preference, other than that I prefer to see things
being done consistently.  Don't we already use case folding hash
function to ensure that we won't add duplicate to the index on
case-insensitive system?  I suspect that we would need to do the
same, whether in cone mode or just a normal sparse-checkout mode
consistently.

Thanks.

[Footnote]

*1* ... unlike HFS+ where everything is forced to NKD and a bit of
information---whether the original was in NKC or NKD---is discarded
forever.