Re: Why is "git tag --contains" so slow?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 03, 2010 at 08:55:43PM -0400, tytso@xxxxxxx wrote:

> > I noticed that my improved time for "git tag --contains" was similar to
> > the total time for "git rev-list --all >/dev/null". Can you try timing
> > that? My suspicion is that it is going to be about 2.9 seconds for you.
> 
> I'm at home, so getting access to my work machine is a bit of a pain,
> so I replicated the experiment at home.

Thanks. Those numbers confirm what I had been thinking.

> Yep, it does blow up in the face of the extreme clock skew in some of
> the ext4 commits in the Linux kernel tree.  (Sorry about that; mea
> culpa, I didn't realize at the time this was a problem, and it was my
> workflow using the guilt program which happened to introduce them.)

Yes, I was thinking specifically of those commits when I warned about
clock skew. :)

> In any case, because of the ext4 commits, I can show a test case which
> doesn't work well with your date cutoff patch:

Not surprising. I think you will find that "git name-rev" (or "git
describe --contains", which simply calls name-rev) will have similar
problems.

> (Or maybe we have git config options that can enable or disable
> optimizations that depend on the lack of clock skews; but I could
> understand people not wanting to maintian the extra code paths.)

I think the best thing we can do is provide a "how much clock skew to
tolerate" variable, and give it a sane default. Then people who know
they have skewed repositories can make the correctness-optimization
tradeoff as they see fit.

The extra code is very minor. It's really only a line or two of code
when calculating the cutoff date to convert "be thorough" into a cutoff
date of 0.

The real question is what that default should be. Name-rev already uses
86400 seconds. The worst skew in git.git is 8 seconds. The worst skew in
linux-2.6.git is 8622098 (about 100 days). For reference, here are my
timings on "git tag --contains HEAD~200" for various allowable clock
skew values:

  0 (don't allow even a second of clock skew): .035s
  86400 (one day of clock skew allowed): .034s
  8622098 (the worst skew in linux-2.6): .252s
  infinite (never cutoff for clock skew): 5.373s

So anything below a day is pointless and lost in the noise. Even 100
days yields quite a satisfactory speedup from the current code, but
obviously that number is totally arbitrary based on one repo.

As you probably guessed from the specificity of the number, I wrote a
short program to actually traverse and find the worst skew. It takes
about 5 seconds to run (unsurprisingly, since it is doing the same full
traversal that we end up doing in the above numbers). So we could
"autoskew" by setting up the configuration on clone, and then
periodically updating it as part of "git gc".

That is perhaps over-engineering (and would add a few seconds to a
clone), but I like that it would Just Work without the user doing
anything.

I'll follow this mail up with a series that implements a cleaned-up
version of the previous patches in this thread, and we'll see what
others think.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]