Re: Why is "git tag --contains" so slow?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 01, 2010 at 08:17:11AM -0400, tytso@xxxxxxx wrote:

> Yeah, I'm not talking about the tag and branch names that show up in
> the top gitk box (and which you can also get via git log --annotate).
> I'm specifically talking about what you get with:
> 
> 	git tags --contains <commit-id>

I think there is room for improving the speed of "git tag --contains".
Consider these two similar operations:

  $ time git name-rev HEAD~200
  HEAD~200 tags/v1.7.1-rc2~2

  real    0m0.060s
  user    0m0.048s
  sys     0m0.012s

  $ time git tag --contains HEAD~200
  v1.7.1
  v1.7.1-rc2
  v1.7.1.1
  v1.7.2-rc0
  v1.7.2-rc1

  real    0m2.179s
  user    0m2.156s
  sys     0m0.020s

Obviously the latter is producing the full list of tags, but both should
have to do a traversal starting at each tag ref.

It looks like "git tag --contains" is implemented as something like:

  for ref in `git tag -l`; do
    if `git merge-base $ref $commit | grep $commit`; then
      echo $ref
    fi
  done

except in C. But you can see we'll do a full merge-base calculation for
each tag. And that merge-base calculation may potentially go quite far
back in history.

The name-rev code has a simple heuristic that it stops traversing when
the timestamp of the traversed commit gets more than a day behind that
of $commit. This doesn't do well with clock skew, but can save a lot of
time in practice. We should perhaps consider something like that here.

I suspect we could do even better by sharing information between
traversals. That is, walk the graph from each ref, but retain marks on
commits between each walk. For a commit c, if walking down each parent
gets us to a root without hitting $commit, then we can mark "c" as not
possibly containing the commit. We can cut short any walk when we hit a
"c" that is already marked. So by traversing from "v1.7.1", if we find
that we do not contain $commit, we will have already marked v1.7.0,
v1.6.*, etc, and will not need to traverse them again.

When you do find one that contains $commit, you can also mark it as
"definitely contains", and you can stop a traversal that definitely
contains. So if we see something is in v1.7.0, then the traversal from
v1.7.1 only needs to go as far back as v1.7.0.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]