Re: [PATCH 0/2] strbuf: improve API

David Turner <dturner@xxxxxxxxxxxxxxxx> · Wed, 01 Jun 2016 16:22:22 -0400



On Wed, 2016-06-01 at 16:09 -0400, Jeff King wrote:
> On Wed, Jun 01, 2016 at 03:50:29PM -0400, David Turner wrote:
> 
> > On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote:
> > >   2. Do caching tricks for strbufs used in tight loops. For
> > > example,
> > >      have strbuf_release() throw its buffer into a last-used
> > > cache,
> > > and
> > >      let the next strbuf_grow() use that cache entry. This cuts
> > > malloc()
> > >      out of the loop.
> > > 
> > >      You'd probably want to protect the cache with a mutex,
> > > though.
> > 
> > 
> > ... or make the last-used cache be thread-local.
> 
> Good idea.
> 
> I almost didn't mention threading at all, because I'd be surprised if
> malloc lock contention is a serious bottleneck for us anywhere.
> 
> It's hard to really say much concrete because it's not clear to me
> where
> people think strbuf allocation _is_ a bottleneck (or again, _would
> be_
> if we were to use it more).
> 
> If we imagine a loop like this:
> 
>   for (i = 0; i < n; i++) {
> 	struct strbuf buf = STRBUF_INIT;
> 	do_something_with(&buf);
> 	strbuf_release(&buf);
>   }
> 
> then yes, we'll call malloc() and free() a lot of times. But unless
> your
> libc malloc is terrible, those calls are all probably just taking a
> mutex and reusing the same block from a free list. The strbuf case is
> actually really friendly to allocators because our blocks tend to be
> predictable sizes (they're sized by ALLOC_GROW, which has a
> deterministic set of block sizes).
> 
> In practice, I'd imagine loops get more complicated than that. They
> call
> sub-functions which allocate a strbuf, and sometimes don't release it
> until other strbufs have been allocated, etc. The proposal in this
> thread is basically using the call stack to mirror the allocation
> patterns. So when the pattern of allocations matches an actual stack,
> that's good, but I'd expect a reasonably smart allocator to perform
> similarly. And when it's not a true stack, the stack-based strbufs
> are
> going to end up with wasted stack space hanging around even after
> we've
> free()'d the memory and could reuse it. And I'd still expect an
> allocator based off a linked list or something to handle such cases
> pretty well.
> 
> There are also other non-big-O factors at play in modern systems,
> like
> CPU cache footprints. Is heap memory cheaper or more expensive to
> access
> than stack? I can imagine stack is kept in the cache better, but if
> you're reusing the same heap block over and over, it probably stays
> in
> cache, too. But if you have unused stack buffers that _could_ be
> reused
> (but won't be because you'd have to manually feed them to a new
> strbuf),
> that hurts your footprint. And on top of that, the stack proposals
> I've
> seen generally involve over-sizing the stack buffers out of a fear of
> actually calling malloc.
> 
> And then on top of that, there is the question of whether anything
> involving a strbuf allocation is actually a tight enough loop to even
> _care_ about cache dynamics.
> 
> So again, I'm slightly negative on anything that makes the code even
> slightly more complex, especially the call sites, if we can't show
> actual measurable improvement numbers.
> 
> Sorry, this turned into a mini-rant, and it's not really directed at
> you, David. Your email was just what spurred me to put some of these
> thoughts into words. :)

I agree with all this stuff anyway.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html