Felipe Contreras <felipe.contreras@xxxxxxxxx> wrote: > I tracked down an issue I have when importing a big repository. For > some reason memory usage keeps increasing until there is no more > memory. > > After looking at the code my guess is that I have a humongous amount > of branches. > > Actually they are not really branches, but refs. For each git commit > there's an original mtn ref that I store in 'refs/mtn/sha1', but since > I'm using 'commit refs/mtn/sha1' to store it, a branch is created for > every commit. > > I guess there are many ways to fix the issue, but for starters I > wonder why is fast-import keeping track of all the branches? In my > case I would like fast-import to work exactly the same if I specify > branches or not (I'll update them later). Because fast-import has to buffer them until the pack file is done. The objects aren't available to the repository until after a checkpoint is sent or until the stream ends. Either way until then fast-import has to buffer the refs so they don't get exposed to other git processes reading that same repository, because they would point to objects that the process cannot find. I guess it could release the brnach memory after it dumps the branches in a checkpoint, but its memory allocators work under an assumption that strings (like branch and file names) will be reused heavily by the frontend and thus they are poooled inside of a string pool. The branch objects are also pooled inside of a common alloc pool, to ammortize the cost of malloc's block headers out over the data used. IOW, fast-import was designed for ~5k branches, not ~1 million unique branches. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html