Re: [BUG?] 'git describe seen'?

SZEDER Gábor <szeder.dev@xxxxxxxxx> · Sun, 18 Apr 2021 11:43:47 +0200

On Sun, Apr 18, 2021 at 10:21:04AM +0200, SZEDER Gábor wrote:
> On Sat, Apr 17, 2021 at 04:45:59PM -0700, Junio C Hamano wrote:
> > This does not seem a new problem at all, as v2.10.0 thru more recent
> > versions of "git describe" seem to give me the same answer.
> > 
> > Anyway, I am seeing a curious symptom.
> > 
> > $ git rev-list --count v2.31.0..seen
> > 716
> > $ git rev-list --count v2.31.1..seen
> > 687
> > 
> > The above means that 'seen' is closer to v2.31.1 than v2.31.0; there
> > are fewer commits that are not in v2.31.1 that are in 'seen', than
> > commits that are not in v2.31.0 that are in 'seen'.  
> > 
> > That is naturally expected.
> > 
> > $ git rev-list --count v2.31.0..v2.31.1
> > 29
> > 
> > And that difference of 29 matches the difference, which is 716 - 687.
> > 
> > But here is what is puzzling.
> > 
> > $ git describe seen
> > v2.31.0-716-g7b65b53281
> > 
> > $ git rev-list --first-parent master..seen |
> >   while read v
> >   do
> > 	d=$(git describe $v)
> > 	echo $v $d
> > 	case "$d" in v2.31.1-*) break ;; esac
> >   done
> > 7b65b53281ae06ee25dd47dead4062125eb54427 v2.31.0-716-g7b65b53281
> > eec14f0fec886c909a29d63a94537df5a62be618 v2.31.0-714-geec14f0fec
> > ...
> > 103835562c64abef2319995716230f92092f87af v2.31.0-569-g103835562c
> > d4324831d9152b16e091646e22a6e03423a59c93 v2.31.1-516-gd4324831d9
> > 
> > Is there something tricky about the topic merged at 10383556 (Merge
> > branch 'jh/rfc-builtin-fsmonitor' into seen, 2021-04-17) to confuse
> > the counting done in "git describe"?
> 
> Subsequent merges with identical timestamps can easily confuse 'git
> describe', I wonder whether this is another symptom of the issue
> reported at:
> 
>   https://public-inbox.org/git/20191008123156.GG11529@xxxxxxxxxx/

That is not the only issue playing a role here, though.

'git describe --debug ...' shows a handful of candidate tags sorted
(primarily) by their depth counted from the to-be-described commit:

  $ git describe --debug origin/seen
  describe origin/seen
  No exact match on refs or tags, searching to describe
   annotated        716 v2.31.0
   annotated        708 v2.31.1
   annotated        733 v2.31.0-rc2
   annotated        755 v2.31.0-rc1
   annotated        818 v2.31.0-rc0
   annotated        895 v2.30.2
   annotated        897 v2.29.3
   annotated        899 v2.28.1
   annotated        901 v2.27.1
   annotated        903 v2.26.3
  traversed 3768 commits
  more than 10 tags found; listed 10 most recent
  gave up search at 42ce4c7930ff494136256554cbed730857084c70
  v2.31.0-716-g7b65b53281

Notice two issues with this output:

  - v2.31.0's depth is ok, but v2.31.1's is wrong:

      $ git rev-list --count v2.31.0..origin/seen
      716
      $ git rev-list --count v2.31.1..origin/seen
      687

  - The order of v2.31.0 and v2.31.1 is clearly wrong.

    I think this is because of the following code snippet from
    builtin/describe.c:describe_commit():

        QSORT(all_matches, match_cnt, compare_pt);

        if (gave_up_on) {
                commit_list_insert_by_date(gave_up_on, &list);
                seen_commits--;
        }
        seen_commits += finish_depth_computation(&list, &all_matches[0]);
        free_commit_list(list);

    I.e. we sort based on depth before calling
    finish_depth_computation(), which doesn't look right...  Looking
    at the depths before sorting shows that v2.31.1's depth is the
    same (708), but v2.31.0's depth is only 696 instead of 716.  Now,
    while this explains why those two tags are listed in the wrong
    order, it is still troubling that v2.31.0's depth is lower that
    v2.31.1's.

    Switching the order of finish_depth_computation() and QSORT() does
    fix thier order based on their depth, but, alas, those depths are
    still wrong:

      $ ./bin-wrappers/git describe --debug origin/seen
      describe origin/seen
      No exact match on refs or tags, searching to describe
       annotated        696 v2.31.0
       annotated        729 v2.31.1
       annotated        733 v2.31.0-rc2
       annotated        755 v2.31.0-rc1
       annotated        818 v2.31.0-rc0
       annotated        895 v2.30.2
       annotated        897 v2.29.3
       annotated        899 v2.28.1
       annotated        901 v2.27.1
       annotated        903 v2.26.3
      traversed 3768 commits
      more than 10 tags found; listed 10 most recent
      gave up search at 42ce4c7930ff494136256554cbed730857084c70
      v2.31.0-696-g7b65b53281

    However, switching the order of finish_depth_computation() and
    QSORT() might be wrong in itself: finish_depth_computation() was
    introduced in 1b600e659a (Compute accurate distances in
    git-describe before output., 2007-01-27), which states [1] the
    following in its commit message:

      We can easily reduce the total number of commits that need to be
      walked at the end by stopping as soon as all of the commits in the
      commit queue are flagged as being merged into the already selected
      best possible tag.

    But which candidate is that "best possible tag"?  The one whose
    depth is the lowest, so we do have to sort before calling
    finish_depth_computation() (or, strictly speaking, we don't have
    to sort, we only have to find the candidate with the lowest
    depth).

At this point I call it quits, because I don't think we can't talk
about accurate depth at all without first fixing the issue I linked in
my previous message.

[1] 1b600e659a also states the following:

      This means we can simply finish
      counting out the revisions still in our commit queue to present
      the accurate distance at the end.  The number of commits remaining
      in the commit queue is probably less than the number of commits
      already traversed, so finishing out the count is not likely to take
      very long. 

   Looking at the 'seen_commits' variable shows that in this
   particular case we traverse 905 commits before calling
   finish_depth_computation(), which then traverses an additional
   3768 - 905 = 2863 commits, i.e. the assumption in 1b600e659a might
   not hold.