Re: [PATCH v2 04/21] strbuf: fix leak when `appendwholeline()` fails with EOF

Patrick Steinhardt <ps@xxxxxx> · Wed, 29 May 2024 13:25:02 +0200

On Wed, May 29, 2024 at 05:16:33AM -0400, Jeff King wrote:
> On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote:
> > > diff --git a/strbuf.c b/strbuf.c
> > > diff --git a/strbuf.c b/strbuf.c
> > > index e1076c9891..aed699c6bf 100644
> > > --- a/strbuf.c
> > > +++ b/strbuf.c
> > > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
> > >  	 * we can just re-init, but otherwise we should make sure that our
> > >  	 * length is empty, and that the result is NUL-terminated.
> > >  	 */
> > > -	if (!sb->buf)
> > > -		strbuf_init(sb, 0);
> > > -	else
> > > -		strbuf_reset(sb);
> > > +	FREE_AND_NULL(sb->buf);
> > > +	strbuf_init(sb, 0);
> > >  	return EOF;
> > >  }
> > >  #else
> > > 
> > > But I think either of those would solve your leak, _and_ would help with
> > > similar leaks of strbuf_getwholeline() and friends.
> > 
> > I'm not quite convinced that `strbuf_getwholeline()` should deallocate
> > the buffer for the caller, I think that makes for quite a confusing
> > calling convention. The caller may want to reuse the buffer for other
> > operations, and it feels hostile to release the buffer under their feet.
> > 
> > The only edge case where I think it would make sense to free allocated
> > data is when being passed a not-yet-allocated strbuf. But I wonder
> > whether the added complexity would be worth it.
> 
> I'm not sure what they'd reuse it for. We necessarily have to reset it
> before reading, so the contents are now garbage. The allocated buffer
> could be reused, but since everybody has to call strbuf_grow() before
> assuming they can write, it's not a correctness issue, but only an
> optimization. But that optimization is pretty unlikely to matter. Since
> we hit this code only on EOF or error, it's generally going to happen
> once in a program, and not in a tight loop.
> 
> If we really cared, though, I think you could check sb->alloc before the
> call to getdelim(), and then we'd know whether the original held an
> allocation or not (and we could restore its state). That's what other
> syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd()
> do.

Ah, I didn't know that we did similar things in other strbuf functions.
With that precedence I think it's less ugly to do this dance.

> That said, I agree that leaks here are not going to be common. Most
> callers are going to call it in a loop and unconditionally release at
> the end, whether they get multiple lines or not. The "append" function
> is the odd man out by reading a single line into a new buffer[1].
> 
> Looking through the results of:
> 
>   git grep -P '(?<!while) \(!?strbuf_get(whole)?line'
> 
> I saw only one questionable case. builtin/difftool.c does:
> 
>   if (strbuf_getline_nul(&lpath, fp))
> 	break;
> 
> without freeing lpath. But then...it does not free it in the case that
> we got a value, either! So I think it is leaking either way, and the
> solution, to strbuf_release(&lpath) outside of the loop, would fix both
> cases.

Indeed. We also didn't free `rpath` and `info`. I do have a follow up to
this series already, so let me add those leak fixes to it.

> > I've been going through all callsites and couldn't spot any that doesn't
> > free the buffer on EOF. So I'd propose to leave this as-is and revisit
> > if we eventually see that this is causing more memory leaks.
> 
> OK. I don't feel too strongly about it, but mostly thought it seemed
> inconsistent with the philosophy of those other strbuf functions.

I get where you're coming from now with the additional info that other
syscall-ish functions do a similar dance. I'll refrain from rerolling
this series just to fix this in a different way, also because neither of
us did spot any additional leaks caused by this.

Patrick
Attachment:
signature.asc

Description: PGP signature