Re: pack-object poor performance (with large number of objects?)

Jeff King <peff@xxxxxxxx> · Tue, 4 Oct 2011 14:08:29 -0400

On Tue, Oct 04, 2011 at 03:21:24PM +0200, Piotr Krukowiecki wrote:

> I have 4GB ram + 4GB swap. Is it possible the RAM is the problem if I
> always have free RAM left and my swap is almost not used?
> For example at the moment repack finished counting objects ("Counting
> objects: 1742200, done."):
> 
> $ free -m
>              total       used       free     shared    buffers     cached
> Mem:          3960       3814        146          0        441        215
> -/+ buffers/cache:       3157        803
> Swap:         6143        694       5449

I am not the best person to comment on Linux's disk caching strategies,
but in general, it should prefer dropping disk cache over pushing
program memory into swap. So no, you're not swapping, but you are
working with only 800M or so to do your disk caching.

So depending how big pack-object's working set of objects is, we might
be overflowing that, and constantly evicting and re-reading objects. I
don't recall offhand what kind of locality there is to pack-object's
accesses.

One thing you could try to reduce the working set is to incrementally
pack some smaller chunks, and then combine them all at the end. That
ends up being more work overall, but at any given time, your working set
of objects will be smaller.

You'd have to do something like this (this is very untested):

  # find out how many revisions we have. Let's pretend it's about
  # 25,000.
  git rev-list HEAD | wc -l

  # now split them into chunks of whatever size you feel like trying.
  # 1000, maybe, or a few thousand. Bearing in mind that this is a gross
  # approximation, since the history is not linear.
  #
  # Start with HEAD~24K (25K total, minus 1K we want to pack)
  echo HEAD~24000 | git pack-objects --revs .git/objects/pack/pack
  # And then prune the loose objects that we just packed.
  git prune-packed
  # And repeat for the next chunk
  echo HEAD~24000..HEAD~23000 | git pack-objects --revs .git/objects/pack/pack
  git prune-packed
  # And so forth...

And then at the end, probably do a "git repack -ad" to put it all in
one big pack. Which should hopefully be less disk-intensive, because now
you'll have a much smaller disk footprint, since most of your objects
are at least delta'd against the others in their own pack.

I have no idea if this will actually go faster for you. But it might be
worth trying, instead of just redoing the svn import with auto-gc turned
on.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html