Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory

Jeff King <peff@xxxxxxxx> · Wed, 8 Jun 2016 15:19:18 -0400

On Wed, Jun 08, 2016 at 06:20:41PM +0200, Michael Haggerty wrote:

> Instead, one could write
> 
> > static int feed_object(const unsigned char *sha1, int fd, int negative)
> > {
> > 	char buf[GIT_SHA1_HEXSZ + 2];
> > 	struct strbuf line = WRAPPED_FIXED_STRBUF(buf);
> > 
> > 	if (negative && !has_sha1_file(sha1))
> > 		return 1;
> > 
> > 	if (negative)
> > 		strbuf_addch(&line, '^');
> > 	strbuf_add(&line, sha1_to_hex(sha1), GIT_SHA1_HEXSZ);
> > 	strbuf_addch(&line, '\n');
> > 	return write_or_whine(fd, line.buf, line.len, "send-pack: send refs");
> > }

Hmm. I'm not sure that just replacing that with a regular heap-allocated
strbuf is so bad. It additionally gets rid of the SHA1_HEXSZ math in the
allocation.

So from your list of advantages:

> * It's a little less manual bookkeeping, and thus less error-prone,
>   than the current code.

We have this, but better.

> * If somebody decides to add another character to the line but
>   forgets to increase the allocation size, the code dies in testing
>   rather than (a) overflowing the buffer, like the current
>   code, or (b) silently becoming less performant, as if it used a
>   preallocated but non-fixed strbuf.

Instead of overflowing, it just silently works.

> * There's no need to strbuf_release() (which can be convenient in
>   a function with multiple exit paths).

Same.

The downside, obviously, is the cost of malloc/free. It may even be
noticeable here here because this really is a tight loop of strbuf
allocation (OTOH, we immediately make a syscall; how expensive is
write() compared to malloc()?).

We can hack around that by reusing the same strbuf.

Unfortunately the usual trick of:

  struct strbuf buf = STRBUF_INIT;
  for (...) {
	strbuf_reset(&buf);
	...
  }
  strbuf_release(&buf);

does not work, because we are in a sub-function. We can pass in the
buffer as scratch space, but that makes the function interface a little
uglier than it needs to be.

Likewise, we could make the strbuf static inside feed_object().  It's
not so bad here, where we know there aren't re-entrancy issues, but it's
not a very safe pattern in general (and it leaks the memory when we're
done with the function).

That made me wonder if we could repeatedly reuse a buffer attached to
the file descriptor. And indeed, isn't that what stdio is? The whole
reason this buffer exists is because we are using a direct descriptor
write. If we switched this function to use fprintf(), we'd avoid the
whole buffer question, have a fixed cap on our memory use (since we just
flush anytime the buffer is full) _and_ we'd reduce the number of
write syscalls we're making by almost a factor of 100.

> I don't know whether this particular function should be rewritten; I'm
> just giving an example of the type of scenario where I think it could be
> useful.
>
> In a world without fixed strbufs, what would one use in this situation?

I know I've done the exact opposite of what you wanted here and talked
about this specific function. But I _do_ think this is a pattern I've
seen several times, where we format into a buffer only to write() it
out. I think they may comprise a reasonable number of our buffer-using
loops.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html