On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote: > > diff --git a/strbuf.c b/strbuf.c > > diff --git a/strbuf.c b/strbuf.c > > index e1076c9891..aed699c6bf 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > > * we can just re-init, but otherwise we should make sure that our > > * length is empty, and that the result is NUL-terminated. > > */ > > - if (!sb->buf) > > - strbuf_init(sb, 0); > > - else > > - strbuf_reset(sb); > > + FREE_AND_NULL(sb->buf); > > + strbuf_init(sb, 0); > > return EOF; > > } > > #else > > > > But I think either of those would solve your leak, _and_ would help with > > similar leaks of strbuf_getwholeline() and friends. > > I'm not quite convinced that `strbuf_getwholeline()` should deallocate > the buffer for the caller, I think that makes for quite a confusing > calling convention. The caller may want to reuse the buffer for other > operations, and it feels hostile to release the buffer under their feet. > > The only edge case where I think it would make sense to free allocated > data is when being passed a not-yet-allocated strbuf. But I wonder > whether the added complexity would be worth it. I'm not sure what they'd reuse it for. We necessarily have to reset it before reading, so the contents are now garbage. The allocated buffer could be reused, but since everybody has to call strbuf_grow() before assuming they can write, it's not a correctness issue, but only an optimization. But that optimization is pretty unlikely to matter. Since we hit this code only on EOF or error, it's generally going to happen once in a program, and not in a tight loop. If we really cared, though, I think you could check sb->alloc before the call to getdelim(), and then we'd know whether the original held an allocation or not (and we could restore its state). That's what other syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd() do. That said, I agree that leaks here are not going to be common. Most callers are going to call it in a loop and unconditionally release at the end, whether they get multiple lines or not. The "append" function is the odd man out by reading a single line into a new buffer[1]. Looking through the results of: git grep -P '(?<!while) \(!?strbuf_get(whole)?line' I saw only one questionable case. builtin/difftool.c does: if (strbuf_getline_nul(&lpath, fp)) break; without freeing lpath. But then...it does not free it in the case that we got a value, either! So I think it is leaking either way, and the solution, to strbuf_release(&lpath) outside of the loop, would fix both cases. > I've been going through all callsites and couldn't spot any that doesn't > free the buffer on EOF. So I'd propose to leave this as-is and revisit > if we eventually see that this is causing more memory leaks. OK. I don't feel too strongly about it, but mostly thought it seemed inconsistent with the philosophy of those other strbuf functions. -Peff [1] I was surprised at first that strbuf_appendwholeline() uses that extra copy, as the obvious implementation is just to skip the strbuf_reset() call in getwholeline(), and then implement "get" as a wrapper around "append" to add back in the reset call. But that only works for the slower getc() implementation. For the getdelim()-based one, there is no notion of append (the interface would have to take an offset into the buffer). We could probably optimize this at the cost of repeating ourselves, but given that there's only one probably-not-very-hot call to appendwholeline, it's likely not worth it.