Re: Something is broken in repack

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Tue, 11 Dec 2007 10:36:20 -0500

On 12/11/07, Nicolas Pitre <nico@xxxxxxx> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > And yet, this is still missing the actual issue.  The issue being that
> > the 2.1GB pack as a _source_ doesn't cause as much memory to be
> > allocated even if the _result_ pack ends up being the same.
> >
> > I was able to repack the 2.1GB pack on my machine which has 1GB of ram.
> > Now that it has been repacked, I can't repack it anymore, even when
> > single threaded, as it start crowling into swap fairly quickly.  It is
> > really non intuitive and actually senseless that Git would require twice
> > as much RAM to deal with a pack that is 7 times smaller.
>
> OK, here's something else for you to try:
>
>         core.deltabasecachelimit=0
>         pack.threads=2
>         pack.deltacachesize=1
>
> With that I'm able to repack the small gcc pack on my machine with 1GB
> of ram using:
>
>         git repack -a -f -d --window=250 --depth=250
>
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.
>
> Would be curious to know what you get with 4 threads on your machine.

Changing those parameters really slowed down counting the objects. I
used to be able to count in 45 seconds now it took 130 seconds. I am
still have the Google allocator linked in.

4 threads, cumulative clock time
25%     200 seconds, 820/627M
55%     510 seconds, 1240/1000M - little late recording
75%     15 minutes, 1658/1500M
90%      22 minutes, 1974/1800M
it's still running but there is no significant change.

Are two types of allocations being mixed?
1) long term, global objects kept until the end of everything
2) volatile, private objects allocated only while the object is being
compressed and then freed

Separating these would make a big difference to the fragmentation
problem. Single threading probably wouldn't see a fragmentation
problem from mixing the allocation types.

When a thread is created it could allocated a private 20MB (or
whatever) pool. The volatile, private objects would come from that
pool. Long term objects would stay in the global pool. Since they are
long term they will just get laid down sequentially in memory.
Separating these allocation types make things way easier for malloc.

CPU time would be helped by removing some of the locking if possible.

-- 
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html