Fwd: Should "head" also work for "HEAD" on case-insensitive FS?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dang, I just noticed that I hit "reply" rather than "reply-to-all" on
the below email (stupid GMail default). Junio, your response to this
email accordingly went only to me.

Michael

---------- Forwarded message ----------
From: Michael Haggerty <mhagger@xxxxxxxxxxxx>
Date: Mon, Jul 10, 2017 at 7:52 AM
Subject: Re: Should "head" also work for "HEAD" on case-insensitive FS?
To: Junio C Hamano <gitster@xxxxxxxxx>


On Fri, Jul 7, 2017 at 12:34 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> [...]
> The exact detail of the encoding used here is immaterial, but I just
> used "encode uppercase letters and % as % followed by two hex",
> which was simple enough.  Usual refs/heads/master and friends will
> not have to be touched when encoded this way.  Perhaps the decoding
> side should be tweaked so that uppercase letters it sees needs to be
> downcased to avoid "refs/heads/Foo" getting returned as "Foo" branch,
> as a "Foo" branch should have been encoded as "refs/heads/%46oo".
>
> Having said that, this patch alone does not quite work yet.
>
>  * In the repository discovery code, we have some logic that
>    hard-codes the path in the directory (which is a candidate for
>    being a repository) to check, like "refs/" and "HEAD".  In the
>    attached illustration patch, files_path_encode() special cases
>    "HEAD" so that it is not munged, which is a bit of ugly
>    workaround for this.
>
>  * I haven't figured out why, but what refs.c calls "pseudo refs"
>    bypasses the files backend layer for some but not all operations,
>    which causes t1405-main-ref-store to fail.  The test creates a
>    "pseudo ref" FOO and then tries to remove it.  Creation seems to
>    follow the files-backend.c and thusly goes through the escaping;
>    refs.::refs_delete_ref() however does not consult files-backend.c
>    and fails to find and delete .git/FOO, because the creation side
>    encoded it as ".git/%46%4F%4F".

I think the most natural thing would be to use different encoding
rules for pseudo-refs (references like "HEAD" and "FETCH_HEAD") and
for other references (those starting with "refs/").

Pseudo-refs (with the partial exception of "HEAD") are quite peculiar
beasts. They sometimes include other information besides the reference
value and IIRC the refs code doesn't have any idea how to write or
read those extra contents. I believe that "HEAD" is the only pseudo
ref for which reflogs are ever written. Pseudo-refs have to match
/[A-Z_]+/ (see https://github.com/git/git/blob/8b2efe2a0fd93b8721879f796d848a9ce785647f/refs.c#L169-L173),
so ISTM that there is no need to encode such references' filenames at
all. (It's possible that the pattern could be made even stricter, like
/[A-Z_]+HEAD/.) Moreover, IIRC, such references are never scanned for
(as in for-each-refs) but rather are always asked for by name. So
their names might never have to be *de*coded, either. On the other
hand, when trying to look them up, it would be a good idea to verify
that the requested name satisfies the above naming rule. Other than
that, I believe it would be preferable to leave pseudo-refs untouched
by your new encoding/decoding code.

Whereas other references are typically lower-case, so it makes sense
for lower-case letters to be the ones that are passed through
transparently in such references' filenames (as in your scheme).

But...since we are talking about introducing a new loose reference
filename encoding, I think it would be a good idea to address a couple
of related issues at the same time:

* Some filesystems natively use Unicode, and insist on a particular
Unicode normalization (NFC vs NFD), which might differ from the
"upstream" normalization. So such reference names get munged when
written as loose references. I'm not enough of an expert in Unicode to
know what the best solution is, except for the strong feeling that it
would require some cooperation from the rest of Git to ensure a good
user experience.

* Another bad effect of our current loose reference encoding is that
it prohibits references that D/F conflict with each other (like
"refs/heads/foo" and "refs/heads/foo/bar") because
"$GIT_DIR/refs/heads/foo" can't be a file and a directory at the same
time. Even if we don't want to support that, this problem also
prevents us from storing reflogs for deleted references, which is a
serious flaw. We could solve this problem by encoding "directory"
components of reference names differently than "leaf" components; for
example, the above references could be encoded as
"refs/heads.d/foo.ref" and "refs/heads.d/foo.d/bar.ref".

It'd be nice to solve all of these related problems at the same time,
because whatever encoding we choose now will have to be supported
forever.

Michael



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux