Re: [PATCH 0/2] strbuf: improve API

Jeff King <peff@xxxxxxxx> · Wed, 1 Jun 2016 16:09:59 -0400

On Wed, Jun 01, 2016 at 03:50:29PM -0400, David Turner wrote:

> On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote:
> >   2. Do caching tricks for strbufs used in tight loops. For example,
> >      have strbuf_release() throw its buffer into a last-used cache,
> > and
> >      let the next strbuf_grow() use that cache entry. This cuts
> > malloc()
> >      out of the loop.
> > 
> >      You'd probably want to protect the cache with a mutex, though.
> 
> 
> ... or make the last-used cache be thread-local.

Good idea.

I almost didn't mention threading at all, because I'd be surprised if
malloc lock contention is a serious bottleneck for us anywhere.

It's hard to really say much concrete because it's not clear to me where
people think strbuf allocation _is_ a bottleneck (or again, _would be_
if we were to use it more).

If we imagine a loop like this:

  for (i = 0; i < n; i++) {
	struct strbuf buf = STRBUF_INIT;
	do_something_with(&buf);
	strbuf_release(&buf);
  }

then yes, we'll call malloc() and free() a lot of times. But unless your
libc malloc is terrible, those calls are all probably just taking a
mutex and reusing the same block from a free list. The strbuf case is
actually really friendly to allocators because our blocks tend to be
predictable sizes (they're sized by ALLOC_GROW, which has a
deterministic set of block sizes).

In practice, I'd imagine loops get more complicated than that. They call
sub-functions which allocate a strbuf, and sometimes don't release it
until other strbufs have been allocated, etc. The proposal in this
thread is basically using the call stack to mirror the allocation
patterns. So when the pattern of allocations matches an actual stack,
that's good, but I'd expect a reasonably smart allocator to perform
similarly. And when it's not a true stack, the stack-based strbufs are
going to end up with wasted stack space hanging around even after we've
free()'d the memory and could reuse it. And I'd still expect an
allocator based off a linked list or something to handle such cases
pretty well.

There are also other non-big-O factors at play in modern systems, like
CPU cache footprints. Is heap memory cheaper or more expensive to access
than stack? I can imagine stack is kept in the cache better, but if
you're reusing the same heap block over and over, it probably stays in
cache, too. But if you have unused stack buffers that _could_ be reused
(but won't be because you'd have to manually feed them to a new strbuf),
that hurts your footprint. And on top of that, the stack proposals I've
seen generally involve over-sizing the stack buffers out of a fear of
actually calling malloc.

And then on top of that, there is the question of whether anything
involving a strbuf allocation is actually a tight enough loop to even
_care_ about cache dynamics.

So again, I'm slightly negative on anything that makes the code even
slightly more complex, especially the call sites, if we can't show
actual measurable improvement numbers.

Sorry, this turned into a mini-rant, and it's not really directed at
you, David. Your email was just what spurred me to put some of these
thoughts into words. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html