Re: Memory issue with fast-import, why track branches?

"Felipe Contreras" <felipe.contreras@xxxxxxxxx> · Mon, 22 Dec 2008 04:36:11 +0200

On Mon, Dec 22, 2008 at 12:17 AM, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
> Felipe Contreras <felipe.contreras@xxxxxxxxx> wrote:
>> I tracked down an issue I have when importing a big repository. For
>> some reason memory usage keeps increasing until there is no more
>> memory.
>>
>> After looking at the code my guess is that I have a humongous amount
>> of branches.
>>
>> Actually they are not really branches, but refs. For each git commit
>> there's an original mtn ref that I store in 'refs/mtn/sha1', but since
>> I'm using 'commit refs/mtn/sha1' to store it, a branch is created for
>> every commit.
>>
>> I guess there are many ways to fix the issue, but for starters I
>> wonder why is fast-import keeping track of all the branches? In my
>> case I would like fast-import to work exactly the same if I specify
>> branches or not (I'll update them later).
>
> Because fast-import has to buffer them until the pack file is done.
> The objects aren't available to the repository until after a
> checkpoint is sent or until the stream ends.  Either way until
> then fast-import has to buffer the refs so they don't get exposed
> to other git processes reading that same repository, because they
> would point to objects that the process cannot find.
>
> I guess it could release the brnach memory after it dumps the
> branches in a checkpoint, but its memory allocators work under an
> assumption that strings (like branch and file names) will be reused
> heavily by the frontend and thus they are poooled inside of a string
> pool.  The branch objects are also pooled inside of a common alloc
> pool, to ammortize the cost of malloc's block headers out over the
> data used.
>
> IOW, fast-import was designed for ~5k branches, not ~1 million
> unique branches.

My point is: why is it not designed for 0 branches? In many places in
the code there's the assumption that the tree = branch, but that's not
always the case. You can specify a 'from sha1' and then the branch
becomes irrelevant.

In fact in monotone some commits are not part of any branch, and many
are part of multiple branches. Those cases can't be handled by
fast-import right now. Not to mention random refs like 'ref/mtn/foo'
which would come in handy for my script.

Now my question is: would it be possible to get rid of the notion of
branches on fast-import and go for refs instead?

On the other hand if branch memory is freed after a checkpoint then
there's no limit to how many 'branches' can be handled.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html