On Mon, Nov 20, 2017 at 09:01:45AM -0500, Ben Peart wrote: > Further testing has revealed that switching from the regular heap to a > refactored version of the mem_pool in fast-import.c produces similar gains > as parallelizing do_index_load(). This appears to be a much simpler patch > for similar gains so we will be pursuing that path. That sounds like a pretty easy win for index entries, which tend to stick around in big clumps. Out of curiosity, have you tried experimenting with any high-performance 3rd-party allocator libraries? I've often wondered if we could get a performance improvement from dropping in a new allocator, but was never able to measure any real benefit over glibc's ptmalloc2. The situation might be different on Windows, though (i.e., if the libc allocator isn't that great). Most of the high-performance allocators are focused on concurrency, which usually isn't a big deal for git. But tcmalloc, at least, claims to be about 6x faster than glibc. The reason I ask is that we could possibly get the same wins without writing a single line of code. And it could apply across the whole code-base, not just the index code. I don't know how close a general purpose allocator could come to a pooled implementation, though. You're inherently making a tradeoff with a pool in not being able to free individual entries. -Peff