On Wed, 12 Dec 2007, Nicolas Pitre wrote: > Add memory fragmentation to that and you have a clogged system. > > Solution: > > pack.deltacachesize=1 > pack.windowmemory=16M > > Limiting the window memory to 16MB will automatically shrink the window > size when big objects are encountered, therefore keeping much fewer of > those objects at the same time in memory, which in turn means they will > be processed much more quickly. And somehow that must help with memory > fragmentation as well. OK scrap that. When I returned to the computer this morning, the repack was completed... with a 1.3GB pack instead. So... The gcc repo apparently really needs a large window to efficiently compress those large objects. But when those large objects are already well deltified and you repack again with a large window, somehow the memory allocator is way more involved, probably even more so when there are several threads in parallel amplifying the issue, and things probably get to a point of no return with regard to memory fragmentation after a while. So... my conclusion is that the glibc allocator has fragmentation issues with this work load, given the notable difference with the Google allocator, which itself might not be completely immune to fragmentation issues of its own. And because the gcc repo requires a large window of big objects to get good compression, then you're better not using 4 threads to repack it with -a -f. The fact that the size of the source pack has such an influence is probably only because the increased usage of the delta base object cache is playing a role in the global memory allocation pattern, allowing for the bad fragmentation issue to occur. If you could run one last test with the mallinfo patch I posted, without the pack.windowmemory setting, and adding the reported values along with those from top, then we could formally conclude to memory fragmentation issues. So I don't think Git itself is actually bad. The gcc repo most certainly constitute a nasty use case for memory allocators, but I don't think there is much we can do about it besides possibly implementing our own memory allocator with active defragmentation where possible (read memcpy) at some point to give glibc's allocator some chance to breathe a bit more. In the mean time you might have to use only one thread and lots of memory to repack the gcc repo, or find the perfect memory allocator to be used with Git. After all, packing the whole gcc history to around 230MB is quite a stunt but it requires sufficient resources to achieve it. Fortunately, like Linus said, such a wholesale repack is not something that most users have to do anyway. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html