Dang, I just noticed that I hit "reply" rather than "reply-to-all" on the below email (stupid GMail default). Junio, your response to this email accordingly went only to me. Michael ---------- Forwarded message ---------- From: Michael Haggerty <mhagger@xxxxxxxxxxxx> Date: Mon, Jul 10, 2017 at 7:52 AM Subject: Re: Should "head" also work for "HEAD" on case-insensitive FS? To: Junio C Hamano <gitster@xxxxxxxxx> On Fri, Jul 7, 2017 at 12:34 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > [...] > The exact detail of the encoding used here is immaterial, but I just > used "encode uppercase letters and % as % followed by two hex", > which was simple enough. Usual refs/heads/master and friends will > not have to be touched when encoded this way. Perhaps the decoding > side should be tweaked so that uppercase letters it sees needs to be > downcased to avoid "refs/heads/Foo" getting returned as "Foo" branch, > as a "Foo" branch should have been encoded as "refs/heads/%46oo". > > Having said that, this patch alone does not quite work yet. > > * In the repository discovery code, we have some logic that > hard-codes the path in the directory (which is a candidate for > being a repository) to check, like "refs/" and "HEAD". In the > attached illustration patch, files_path_encode() special cases > "HEAD" so that it is not munged, which is a bit of ugly > workaround for this. > > * I haven't figured out why, but what refs.c calls "pseudo refs" > bypasses the files backend layer for some but not all operations, > which causes t1405-main-ref-store to fail. The test creates a > "pseudo ref" FOO and then tries to remove it. Creation seems to > follow the files-backend.c and thusly goes through the escaping; > refs.::refs_delete_ref() however does not consult files-backend.c > and fails to find and delete .git/FOO, because the creation side > encoded it as ".git/%46%4F%4F". I think the most natural thing would be to use different encoding rules for pseudo-refs (references like "HEAD" and "FETCH_HEAD") and for other references (those starting with "refs/"). Pseudo-refs (with the partial exception of "HEAD") are quite peculiar beasts. They sometimes include other information besides the reference value and IIRC the refs code doesn't have any idea how to write or read those extra contents. I believe that "HEAD" is the only pseudo ref for which reflogs are ever written. Pseudo-refs have to match /[A-Z_]+/ (see https://github.com/git/git/blob/8b2efe2a0fd93b8721879f796d848a9ce785647f/refs.c#L169-L173), so ISTM that there is no need to encode such references' filenames at all. (It's possible that the pattern could be made even stricter, like /[A-Z_]+HEAD/.) Moreover, IIRC, such references are never scanned for (as in for-each-refs) but rather are always asked for by name. So their names might never have to be *de*coded, either. On the other hand, when trying to look them up, it would be a good idea to verify that the requested name satisfies the above naming rule. Other than that, I believe it would be preferable to leave pseudo-refs untouched by your new encoding/decoding code. Whereas other references are typically lower-case, so it makes sense for lower-case letters to be the ones that are passed through transparently in such references' filenames (as in your scheme). But...since we are talking about introducing a new loose reference filename encoding, I think it would be a good idea to address a couple of related issues at the same time: * Some filesystems natively use Unicode, and insist on a particular Unicode normalization (NFC vs NFD), which might differ from the "upstream" normalization. So such reference names get munged when written as loose references. I'm not enough of an expert in Unicode to know what the best solution is, except for the strong feeling that it would require some cooperation from the rest of Git to ensure a good user experience. * Another bad effect of our current loose reference encoding is that it prohibits references that D/F conflict with each other (like "refs/heads/foo" and "refs/heads/foo/bar") because "$GIT_DIR/refs/heads/foo" can't be a file and a directory at the same time. Even if we don't want to support that, this problem also prevents us from storing reflogs for deleted references, which is a serious flaw. We could solve this problem by encoding "directory" components of reference names differently than "leaf" components; for example, the above references could be encoded as "refs/heads.d/foo.ref" and "refs/heads.d/foo.d/bar.ref". It'd be nice to solve all of these related problems at the same time, because whatever encoding we choose now will have to be supported forever. Michael