Re: Performance issue: initial git clone causes massive repack

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Mon, 6 Apr 2009 07:48:29 -0700

Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> 
> How do you deal with dense history packs? These packs take many hours
> to make (on a server class machine) and can be half the size of a
> regular pack. Shouldn't there be a way to copy these packs intact on
> an initial clone? It's ok if these packs are specially marked as being
> ok to copy.

These should be copied as-is.

Basically, object enumeration lists every reachable object, which
should include every object in this pack if its a "dense history
pack".  We then start to write out each object.  As each object
is written we look to see if it already exists in a pack.  It does
(in your dense history pack), so we then look to see if its delta
base is also in the output list (it is), so we send the data as-is.

One of the bigger costs with such clones is building that huge list
of objects needed to send.  The primary cost appears to be unpacking
the trees from the "dense history pack", where delta chains are
usually quite long.  The GSoC 2009 pack caching project idea is
based on the theory that we should be able to save a list of objects
that are reachable from some fixed point (e.g. a very well known,
stable tag), and avoid needing to read these ancient trees.

But its just a theory.  Caching always costs you management
overheads.  And it may not save us that much time .  And most of
the theory here is based on JGit's performance during packing,
*not* git-core.

I came up with the object list caching idea because JGit's object
enumeration is just pitiful.  (Its Java, what do you want, if you
wanted fast, you'd use portable assembler... like git-core does.)
Whether or not its worth applying to git-core is another story
entirely.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html