Re: Memory issue with fast-import, why track branches?

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Sun, 21 Dec 2008 14:17:02 -0800

Felipe Contreras <felipe.contreras@xxxxxxxxx> wrote:
> I tracked down an issue I have when importing a big repository. For
> some reason memory usage keeps increasing until there is no more
> memory.
> 
> After looking at the code my guess is that I have a humongous amount
> of branches.
> 
> Actually they are not really branches, but refs. For each git commit
> there's an original mtn ref that I store in 'refs/mtn/sha1', but since
> I'm using 'commit refs/mtn/sha1' to store it, a branch is created for
> every commit.
> 
> I guess there are many ways to fix the issue, but for starters I
> wonder why is fast-import keeping track of all the branches? In my
> case I would like fast-import to work exactly the same if I specify
> branches or not (I'll update them later).

Because fast-import has to buffer them until the pack file is done.
The objects aren't available to the repository until after a
checkpoint is sent or until the stream ends.  Either way until
then fast-import has to buffer the refs so they don't get exposed
to other git processes reading that same repository, because they
would point to objects that the process cannot find.

I guess it could release the brnach memory after it dumps the
branches in a checkpoint, but its memory allocators work under an
assumption that strings (like branch and file names) will be reused
heavily by the frontend and thus they are poooled inside of a string
pool.  The branch objects are also pooled inside of a common alloc
pool, to ammortize the cost of malloc's block headers out over the
data used.

IOW, fast-import was designed for ~5k branches, not ~1 million
unique branches.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html