Re: BUG FOLLOWUP: Case insensitivity in worktrees

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Fri, 24 Jul 2020 21:09:40 +0000

On 2020-07-24 at 18:14:03, Casey Meijer wrote:
> I think I misunderstood your claim actually Brian.   What is a bug is
> asking for worktree A's head and getting the main worktree's head. A
> super dangerous bug.
> 
> I certainly disagree with your assertion that asking for head and not
> getting HEAD (or HeaD or hEAd) on a case-insensitive storage engine
> isn't a bug and it certainly shouldn't be a bug once extensible
> storage engines are in place: the storage engine should have final say
> on how objects are stored and retrieved, not git-core.

If you want to refer to HEAD, writing it "head" is always wrong.  "head"
is not a special ref to Git, and on a case-sensitive system, I am fully
entitled to create a branch, tag, or other ref with that name that is
independent from HEAD.

It's wrong because regardless of operating system, you don't
intrinsically know whether the repository is case sensitive.  Windows 10
permits case-sensitive directories and macOS has case-sensitive file
systems, so you cannot assume that "head" and "HEAD" are the same
without knowing the setting of "core.ignorecase" and the properties of
the file system.

So when you write "head", you are not asking for HEAD in any worktree or
repository at all.

We are fully aware that Git cannot consistently store refs differing in
case on case-insensitive file systems, and we agree that's a bug.
Reftable will fix that, and as I mentioned, it is being worked on.  It
is not, however, a deficiency that refs are intrinsically case
sensitive, and let me explain why.

First, Git does not require that refs are in any particular encoding.
Specifically, they need not be in Unicode or UTF-8.  It is valid to have
many characters in a ref name, including 0xff.  That means any type of
case folding is not possible, since a ref need not correspond to actual
text.

Second, even if we did require them to be UTF-8, it is impossible to
consistently fold case in a way that works for all locales.  Turkish and
other Turkic languages have a dotted I and a dotless I[0].  The ASCII
uppercase I would fold to a dotless lowercase I for Turkish and to the
ASCII (dotted) lowercase I for English.  Similarly, the ASCII lowercase
I is dotted, and folds to a dotted uppercase I in Turkish and an ASCII
(dotless) uppercase I in English.

It is literally not possible to correctly perform case-folding in a
locale-independent way.  Every attempt to do so will get at least this
case wrong (not to mention other cases that occur), and Turkic languages
are spoken by 200 million people, so ignoring their needs is not only
harmful, but also impacts a massive number of people.  That major OS
designers have made this mistake doesn't mean that we should as well.

We wouldn't perform ASCII-only case folding for all of the reasons
mentioned above and because it's Anglocentric.  As someone who speaks
both Spanish and French, I would find that unsuitable and the results
bizarre.

So I understand that you may expect that on Windows or macOS that you
can write "head" and get HEAD and be surprised when that doesn't work in
all cases.  But that is not, and never has been, expected to work, nor
is it a bug that it doesn't.

[0] https://en.wikipedia.org/wiki/Dotted_and_dotless_I
-- 
brian m. carlson: Houston, Texas, US
Attachment:
signature.asc

Description: PGP signature