Re: [PATCH v2 04/21] strbuf: fix leak when `appendwholeline()` fails with EOF

Jeff King <peff@xxxxxxxx> · Wed, 29 May 2024 05:16:33 -0400

On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote:

> > diff --git a/strbuf.c b/strbuf.c
> > diff --git a/strbuf.c b/strbuf.c
> > index e1076c9891..aed699c6bf 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
> >  	 * we can just re-init, but otherwise we should make sure that our
> >  	 * length is empty, and that the result is NUL-terminated.
> >  	 */
> > -	if (!sb->buf)
> > -		strbuf_init(sb, 0);
> > -	else
> > -		strbuf_reset(sb);
> > +	FREE_AND_NULL(sb->buf);
> > +	strbuf_init(sb, 0);
> >  	return EOF;
> >  }
> >  #else
> > 
> > But I think either of those would solve your leak, _and_ would help with
> > similar leaks of strbuf_getwholeline() and friends.
> 
> I'm not quite convinced that `strbuf_getwholeline()` should deallocate
> the buffer for the caller, I think that makes for quite a confusing
> calling convention. The caller may want to reuse the buffer for other
> operations, and it feels hostile to release the buffer under their feet.
> 
> The only edge case where I think it would make sense to free allocated
> data is when being passed a not-yet-allocated strbuf. But I wonder
> whether the added complexity would be worth it.

I'm not sure what they'd reuse it for. We necessarily have to reset it
before reading, so the contents are now garbage. The allocated buffer
could be reused, but since everybody has to call strbuf_grow() before
assuming they can write, it's not a correctness issue, but only an
optimization. But that optimization is pretty unlikely to matter. Since
we hit this code only on EOF or error, it's generally going to happen
once in a program, and not in a tight loop.

If we really cared, though, I think you could check sb->alloc before the
call to getdelim(), and then we'd know whether the original held an
allocation or not (and we could restore its state). That's what other
syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd()
do.

That said, I agree that leaks here are not going to be common. Most
callers are going to call it in a loop and unconditionally release at
the end, whether they get multiple lines or not. The "append" function
is the odd man out by reading a single line into a new buffer[1].

Looking through the results of:

  git grep -P '(?<!while) \(!?strbuf_get(whole)?line'

I saw only one questionable case. builtin/difftool.c does:

  if (strbuf_getline_nul(&lpath, fp))
	break;

without freeing lpath. But then...it does not free it in the case that
we got a value, either! So I think it is leaking either way, and the
solution, to strbuf_release(&lpath) outside of the loop, would fix both
cases.

> I've been going through all callsites and couldn't spot any that doesn't
> free the buffer on EOF. So I'd propose to leave this as-is and revisit
> if we eventually see that this is causing more memory leaks.

OK. I don't feel too strongly about it, but mostly thought it seemed
inconsistent with the philosophy of those other strbuf functions.

-Peff

[1] I was surprised at first that strbuf_appendwholeline() uses that
    extra copy, as the obvious implementation is just to skip the
    strbuf_reset() call in getwholeline(), and then implement "get" as a
    wrapper around "append" to add back in the reset call.

    But that only works for the slower getc() implementation. For the
    getdelim()-based one, there is no notion of append (the interface
    would have to take an offset into the buffer).  We could probably
    optimize this at the cost of repeating ourselves, but given that
    there's only one probably-not-very-hot call to appendwholeline, it's
    likely not worth it.