On Wed, Jun 01, 2016 at 03:50:29PM -0400, David Turner wrote: > On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote: > > 2. Do caching tricks for strbufs used in tight loops. For example, > > have strbuf_release() throw its buffer into a last-used cache, > > and > > let the next strbuf_grow() use that cache entry. This cuts > > malloc() > > out of the loop. > > > > You'd probably want to protect the cache with a mutex, though. > > > ... or make the last-used cache be thread-local. Good idea. I almost didn't mention threading at all, because I'd be surprised if malloc lock contention is a serious bottleneck for us anywhere. It's hard to really say much concrete because it's not clear to me where people think strbuf allocation _is_ a bottleneck (or again, _would be_ if we were to use it more). If we imagine a loop like this: for (i = 0; i < n; i++) { struct strbuf buf = STRBUF_INIT; do_something_with(&buf); strbuf_release(&buf); } then yes, we'll call malloc() and free() a lot of times. But unless your libc malloc is terrible, those calls are all probably just taking a mutex and reusing the same block from a free list. The strbuf case is actually really friendly to allocators because our blocks tend to be predictable sizes (they're sized by ALLOC_GROW, which has a deterministic set of block sizes). In practice, I'd imagine loops get more complicated than that. They call sub-functions which allocate a strbuf, and sometimes don't release it until other strbufs have been allocated, etc. The proposal in this thread is basically using the call stack to mirror the allocation patterns. So when the pattern of allocations matches an actual stack, that's good, but I'd expect a reasonably smart allocator to perform similarly. And when it's not a true stack, the stack-based strbufs are going to end up with wasted stack space hanging around even after we've free()'d the memory and could reuse it. And I'd still expect an allocator based off a linked list or something to handle such cases pretty well. There are also other non-big-O factors at play in modern systems, like CPU cache footprints. Is heap memory cheaper or more expensive to access than stack? I can imagine stack is kept in the cache better, but if you're reusing the same heap block over and over, it probably stays in cache, too. But if you have unused stack buffers that _could_ be reused (but won't be because you'd have to manually feed them to a new strbuf), that hurts your footprint. And on top of that, the stack proposals I've seen generally involve over-sizing the stack buffers out of a fear of actually calling malloc. And then on top of that, there is the question of whether anything involving a strbuf allocation is actually a tight enough loop to even _care_ about cache dynamics. So again, I'm slightly negative on anything that makes the code even slightly more complex, especially the call sites, if we can't show actual measurable improvement numbers. Sorry, this turned into a mini-rant, and it's not really directed at you, David. Your email was just what spurred me to put some of these thoughts into words. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html