Re: [PATCHv3 7/7] builtin/describe.c: describe a blob

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Tue, 14 Nov 2017 12:02:08 -0800

On Thu,  2 Nov 2017 12:41:48 -0700
Stefan Beller <sbeller@xxxxxxxxxx> wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> "This is an interesting endeavor, because describing things is hard."
>   -- me, upon writing this patch.
> 
> When describing commits, we try to anchor them to tags or refs, as these
> are conceptually on a higher level than the commit. And if there is no ref
> or tag that matches exactly, we're out of luck.  So we employ a heuristic
> to make up a name for the commit. These names are ambiguous, there might
> be different tags or refs to anchor to, and there might be different
> path in the DAG to travel to arrive at the commit precisely.
> 
> When describing a blob, we want to describe the blob from a higher layer
> as well, which is a tuple of (commit, deep/path) as the tree objects
> involved are rather uninteresting.  The same blob can be referenced by
> multiple commits, so how we decide which commit to use?  This patch
> implements a rather naive approach on this: As there are no back pointers
> from blobs to commits in which the blob occurs, we'll start walking from
> any tips available, listing the blobs in-order of the commit and once we
> found the blob, we'll take the first commit that listed the blob.  For
> source code this is likely not the first commit that introduced the blob,
> but rather the latest commit that contained the blob.  For example:
> 
>   git describe v0.99:Makefile
>   v0.99-5-gab6625e06a:Makefile
> 
> tells us the latest commit that contained the Makefile as it was in tag
> v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
> commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
> 2005-07-11) touches the Makefile.
> 
> Let's see how this description turns out, if it is useful in day-to-day
> use as I have the intuition that we'd rather want to see the *first*
> commit that this blob was introduced to the repository (which can be
> achieved easily by giving the `--reverse` flag in the describe_blob rev
> walk).

The method of your intuition indeed seems better - could we just have
this from the start?

Alternatively, to me, it seems that listing commits that *introduces*
the blob (that is, where it references the blob, but none of its parents
do) would be the best way. That would then be independent of traversal
order (and we would no longer need to find a tag etc. to tie the blob
to).

If we do that, it seems to me that there is a future optimization that
could get the first commit to the user more quickly - once a commit
without the blob and a descendant commit with the blob is found, that
interval can be bisected, so that the first commit is found in O(log
number of commits) instead of O(commits). But this can be done later.