Re: Performance issue: initial git clone causes massive repack

Björn Steinbrink <B.Steinbrink@xxxxxx> · Tue, 7 Apr 2009 20:12:59 +0200

On 2009.04.07 13:48:02 -0400, Nicolas Pitre wrote:
> On Tue, 7 Apr 2009, Björn Steinbrink wrote:
> > And in the upload-pack case, there's also pack-objects running
> > concurrently, already going up to 950M RSS/100M shared _while_ the
> > rev-list is still running. So that's 3G of memory usage (2G if you
> > ignore the shared stuff) before the "Compressing objects" part even
> > starts. And of course, pack-objects will apparently start to mmap the
> > pack files only after the rev-list finished, so a "smart" OS might have
> > removed a lot of the mmapped stuff from memory again, causing it to be
> > re-read. :-/
> 
> The first low hanging fruit to help this case is to make upload-pack use 
> the --revs argument with pack-object to let it do the object enumeration 
> itself directly, instead of relying on the rev-list output through a 
> pipe.  This is what 'git repack' does already.  pack-objects has to 
> access the pack anyway, so this would eliminate an extra access from a 
> different process.

Hm, for an initial clone that would end up as:
git pack-objects --stdout --all
right?

If so, that doesn't look it it's going to work out as easily as one
would hope. Robin said that both processes, git-upload-pack (which does
the rev-list) and pack-objects peaked at ~2GB of RSS (which probably
includes the mmapped packs). But the above pack-objects with --all peaks
at 3.1G here, so it basically seems to keep all the stuff in memory that
the individual processes had. But this way, it's all at once, not 2G
first and then 2G in a second process, after the first one exitted.

Björn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html