Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces

Elijah Newren <newren@xxxxxxxxx> · Sat, 4 Jan 2025 10:55:42 -0800

On Sat, Jan 4, 2025 at 9:51 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Elijah Newren <newren@xxxxxxxxx> writes:
>
> >> In general what would we do if a string can be interpreted in
> >> multiple ways in _different_ parts of the object-name codepaths.  We
> >> all know that "affed" would trigger the "ambiguous object name"
> >> error if there are more than one object whose object name begins
> >> with "affed", but if "${garbage}-gaffed" can be interpreted as the
> >> name of an object whose object name begins with "affed" and also can
> >> be interpreted as the name of another object that sits at a path
> >> that ends with "-gaffed" in some tree object, regardless of how the
> >> leading part "${garbage}" looks like, it would be desirable if we
> >> declared such a string as "ambiguous" the same way.
> >
> > How would that be desirable?
>
> In "a:b/c-0-gabcde", *if* "a:b/c-0" *were* a valid way to spell a
> valid refname, then the whole thing is an ambiguous object name,
> i.e. it could be "something reachable from object 'a:b/c' whose
> object name begins with abcde", or it could be "object at the path
> b/c-0-gabcde in a tree-ish a", and in such a case our code should be
> set up to allow us to give a "that's ambiguous" error, instead of
> yielding the first possible interpretation (i.e. if we happen to
> have checked the describe name first and "$garbage-0-gabcde", we
> yield "abcde" before even checking if $garbage part gives a possible
> leading part of a tree-ish; but if a future refactoring of the code
> flips the order of checking, we may end up yielding 'an object at a
> path, which ends with -0-gabcde, sitting in a tree-ish', without
> checking if that could be a valid describe name).
>
> Of course we should make sure that the syntax cannot be ambiguous
> when we introduce a new syntax to represent a new feature ;-)
>
> Now, I think ":" has always been a byte that is invalid as a part of
> any refname, so "${garbage}-gabcde" with a colon in ${garbage}
> cannot be a describe name.  So in the above about "a:b/c-0" is an
> impossible example, but I was wondering more about the general
> principle we should follow.

Are you only interested in the general principle for the "possible
examples"?  What about the general principle for the "impossible
examples"?  Things like "master:path/to/who-gabbed" are unambiguously
a reference to a path within a revision that cannot be spelled any
alternate way, but the code currently gives the user a commit instead.
What's the right way to fix these "impossible examples"?  I've given
three proposals and implemented the first of them:
  - ${POSSIBLY_VALID_REFNAME}-${INTEGER}-g${HASH}
  - ${POSSIBLY_VALID_REFNAME}-g${HASH}
  - ${ANYTHING_WITHOUT_A_COLON}-g${HASH}

You said you don't like the first two because check_refname() rules
might change, and not commented on the third.

Also, as far as I can tell, the set of "possible examples" you are
focusing on is currently the empty set.  A change of syntax might in
the future expand that to a non-empty-set, and then bring us backward
compatibility headaches because we have been allowing
"${garbage}-g${hash}" to mean a reference to ${hash} and we'd then
have to deal with it becoming ambiguous (and potentially also having
no way to disambiguate those cases, similar to how if colon is allowed
in garbage then we have no way to disambiguate paths).  If we want to
allow future object naming extensions, it seems like we should lock
down and rule out as many existing forms of known ${garbage} as we
can, but that'd push us towards the
${POSSIBLY_VALID_REFNAME}-${INTEGER}-g${HASH} solution I implemented
that you don't seem to like.  Is there a middle ground that you do
like?