Re: [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 13, 2011 at 03:06:44AM -0400, Jeff King wrote:

> This optimization can provide massive speedups. For example,
> doing "git tag --contains HEAD~1000" in the linux-2.6
> repository goes from:
> 
>   real    0m3.139s
>   user    0m3.044s
>   sys     0m0.092s
> 
> to:
> 
>   real    0m0.035s
>   user    0m0.028s
>   sys     0m0.004s

I pulled this commit message from the original "cutoff at timestamp"
patch, though I did update the timings for the new code. What it doesn't
mention is that the first run will take something like 3.7 seconds, and
then subsequent ones will be way faster. I had mentioned that number
elsewhere in the thread, but it should probably go here. I'll put it in
the next version.

One number I haven't mentioned elsewhere, though, is how expensive it is
to add new commits to the cache. So here's an interesting timing:

  $ cd linux-2.6

  : slow, cache-generating time
  $ rm .git/cache/generations
  $ time git tag --contains HEAD
  real    0m3.795s
  user    0m3.420s
  sys     0m0.372s

  : fast, cached time
  $ time git tag --contains HEAD
  real    0m0.022s
  user    0m0.008s
  sys     0m0.012s

  : now what if we add one more commit?
  $ echo foo >>Makefile && git commit -a -m foo
  $ time git tag --contains HEAD
  real    0m0.271s
  user    0m0.020s
  sys     0m0.252s

It takes barely any time to get the generation of the new commit, but we
spend .25 seconds writing the whole new cache file out. This could be
improved with a more clever disk format that contained a journal of
unsorted newly written entries. You'd still write the full cache out
once in a while, but the cost would be amortized.

I'm not sure the complexity is worth it, though. Yes, the write-out time
is way slower than the super-fast everything-is-cached case. But it
doesn't happen that often (only when you have new commits, _and_ your
traversal actually looks at them). And it's still an order of magnitude
faster than it is without the cache at all. I doubt I would even notice
a quarter-second delay, or would just chalk it up to a few objects
needing to be pulled from disk.

So I'm inclined to leave it as-is, at least for now. If somebody wants
to revisit the topic later and speed up cache writing, they can. But I
don't want a complex solution to hold up this series, which is already a
big improvement.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]