Re: Performance issue: initial git clone causes massive repack

Jon Smirl <jonsmirl@xxxxxxxxx> · Mon, 6 Apr 2009 10:37:44 -0400

On Mon, Apr 6, 2009 at 10:19 AM, Nicolas Pitre <nico@xxxxxxx> wrote:
> On Mon, 6 Apr 2009, Jon Smirl wrote:
>
>> On Mon, Apr 6, 2009 at 1:15 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> > Nicolas Pitre <nico@xxxxxxx> writes:
>> >
>> >> What git-pack-objects does in this case is not a full repack.  It
>> >> instead _reuse_ as much of the existing packs as possible, and only does
>> >> the heavy packing processing for loose objects and/or inter pack
>> >> boundaryes when gluing everything together for streaming over the net.
>> >> If for example you have a single pack because your repo is already fully
>> >> packed, then the "packing operation" involved during a clone should
>> >> merely copy the existing pack over with no further attempt at delta
>> >> compression.
>> >
>> > One possibile scenario that you still need to spend memory and cycle is if
>> > the cloned repository was packed to an excessive depth to cause many of
>> > its objects to be in deltified form on insanely deep chains, while cloning
>> > send-pack uses a depth that is more reasonable.  Then pack-objects invoked
>> > by send-pack is not allowed to reuse most of the objects and would end up
>> > redoing the delta on them.
>>
>> That seems broken. You went through all of the trouble to make the
>> pack file smaller to reduce transmission time, and then clone undoes
>> the work.
>
> And as I already explained, this is indeed not what happens.
>
>> What about making a very simple special case for an initial clone?
>
> There should not be any need for initial clone hacks.
>
>> First thing an initial clone does is copy all of the pack files from
>> the server to the client without even looking at them.
>
> This is a no go for reasons already stated many times.  There are
> security implications (those packs might contain stuff that you didn't
> intend to be publically accessible) and there might be efficiency
> reasons as well (you might have a shared object store with lots of stuff
> unrelated to the particular clone).

How do you deal with dense history packs? These packs take many hours
to make (on a server class machine) and can be half the size of a
regular pack. Shouldn't there be a way to copy these packs intact on
an initial clone? It's ok if these packs are specially marked as being
ok to copy.

>
> The biggest cost right now when cloning a big packed repo is object
> enumeration.  Any other issues related to memory costs in the GB range
> simply has no reason for it, and is mostly due to misconfigurations or
> bugs that have to be fixed.  Trying to work around the issue by all
> sorts of hacks is simply counter productive.
>
> In the case that started this very thread, I suspect that a small
> misfeature of some delta caching might be the culprit.  I asked Robin H.
> Johnson to perform a really simple config addition to his repo and
> retest, for which we still haven't seen any results yet.
>
>
> Nicolas
>

-- 
Jon Smirl
jonsmirl@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html