Re: [PATCH] ref-filter: handle nested tags in --points-at option

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 04 Jul 2023 23:10:47 -0700

Jeff King <peff@xxxxxxxx> writes:

> That seems reasonable to me. It is changing the definition of
> --points-at slightly, but I think in a way that should be less
> surprising to users (i.e., we can consider the old behavior a bug).

OK.

> The
> existing documentation is sufficiently vague about "points" that I don't
> think it needs to be updated (though arguably we could improve that
> here, too).

True, too.

Having said that, as we lack a primitive exposed to the user script
to only peel one level of a tag, it makes me wonder how much of this
is a practical issue.  Is a tag that points at another tag mere
curiosity and subject of mental gymnastics, or is it a useful
construct with real world use case?  If the latter, isn't it more
like "a tag to a tag _could_ be made useful if we also supported X
and Y and Z" where X could be "There should be a syntax like
A^{commit} that lets us peel only one level of tag"?

> Note that most other tag-peeling in Git (like the peeled values returned
> by upload-pack) errs in the opposite direction: they peel completely,
> and don't show the intermediate values.

Exactly.  While allowing to feed the intermediate level as the
argument to "points at" does sound like a good idea on surface, I am
skeptical about the practical value it brings---if we cannot do this
without any additional overhead (of course if there are tags to tags
in the repository, I am perfectly fine to pay the cost of
dereferencing the chain, but I am more worried about regressing the
performance in repositories without tags to tags), I am not sure if
it is worth it.

> My biggest question would be whether this introduces any performance
> penalty for the more common cases (lightweight tags and single-level
> annotated tags). The answer is "no", I think; we are already paying the
> cost to parse every object to find out if it's a tag, and your new loop
> only does an extra parse if we see a tag-of-tag. Good.

OK.

>   Let me go off on a tangent here, since I'm looking at the performance
>   of this function. The current code is already rather pessimal here, as
>   we could probably avoid parsing non-tags entirely. Some strategies
>   there are:
>
>     1. We could check the object type via oid_object_info() before
>        parsing. This carries a small penalty (two lookups) for tags but
>        a big win (avoiding loading the object contents) for non-tags.
>
>        An easy way to do this is to replace the parse_object() with
>        parse_object_with_flags(PARSE_OBJECT_SKIP_HASH_CHECK), which
>        tries to avoid loading object contents (especially using the
>        commit-graph for commits, which presumably covers most non-tag
>        refs).
>
>     2. We could be using the peel_iterated_oid() interface (this is the
>        peel_ref() thing mentioned in the comment you touched, but it has
>        since been renamed). But it does the "peel all the way" thing
>        mentioned above (both because of its interface, but also because
>        that's what the packed-refs peel lines store).
>
>        So to do that we'd either have to enhance the packed-refs store
>        (which would not be too hard to do in a backwards-compatible
>        way), or switch --points-at to only match either the direct ref
>        value or the fully-peeled value.
>
>   I don't think either of those is something your patch needs to deal
>   with. It is not making these kinds of optimizations any harder (it is
>   the existing "peel only once" behavior that does so). I mostly wanted
>   to get it written down while we are all looking at this function.

My preference would be to see these optimization done first and then
add this new loop on top of it.  That way, we can measure more
easily what kind of additional overhead, if any, we are paying by
adding the loop.

Thanks.