Re: [PATCH 1/4] reftable: avoid leaks on realloc error

Patrick Steinhardt <ps@xxxxxx> · Mon, 30 Dec 2024 07:29:07 +0100

On Fri, Dec 27, 2024 at 09:16:27PM +0100, René Scharfe wrote:
> Am 27.12.24 um 11:33 schrieb Patrick Steinhardt:
> > On Wed, Dec 25, 2024 at 07:38:29PM +0100, René Scharfe wrote:
> >> When realloc(3) fails, it returns NULL and keeps the original allocation
> >> intact.  REFTABLE_ALLOC_GROW overwrites both the original pointer and
> >> the allocation count variable in that case, simultaneously leaking the
> >> original allocation and misrepresenting the number of storable items.
> >>
> >> parse_names() and reftable_buf_add() avoid leaking by restoring the
> >> original pointer value on failure, but all other callers seem to be OK
> >> with losing the old allocation.  Add a new variant of the macro,
> >> REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the
> >> allocation counter.  Use it for those callers.
> >
> > Hm, okay. I find it a bit curious to discern those two macros from each
> > other as all callers need to handle OOM errors anyway, so doing the safe
> > thing should likely be our default here and all callsites that don't
> > should be adapted, shouldn't they?
> 
> I agree, and I my first version only had REFTABLE_ALLOC_GROW.  Keeping
> stuff unchanged if we cannot grow should be safer, right?  But it would
> introduce a leak if the caller exits without cleaning up, so each of
> them needs to be audited.  I was too lazy for that.  And it's work that
> can be parallelized..

Fair enough.

> > In the case of `reftable_buf_add()` I kind of doubt the usefulness of
> > handling the error just to keep the old pointer intact, as all callsites
> > will ultimately error out anyway.
> 
> I can imagine use cases where an object is built piece by piece, one
> part is too large and then you still want to keep all the rest and just
> replace the huge thing with a placeholder or entirely ignore it.  Could
> be a case of YAGNI, though.

Probably.

> > But in the case of `parse_names()` we
> > do in fact want to handle the case specially so that we can free any
> > names we have already parsed, so that case makes sense indeed.
> 
> Yes.  But that leads me on a tangent: Is it really a good idea to load
> a file into lots of individual string objects instead of loading into
> a single big buffer and pointing directly into it?  Do those strings
> need to have individual lifetimes?

Good question indeed. I don't think we ever need individual lifetimes
here. On the other hand it's probably okayish, too, given that the
number of table names should be limited due to automatic compaction.

> > So there is merit in having two separate wrappers, but it would be nice
> > if `REFTABLE_ALLOC_GROW()` would be doing the "right thing" for most
> > cases while the above two callsites would be adapted to use a wrapper
> > that requires a bit more thought to use correctly. For example something
> > like `REFTABLE_TRY_ALLOC_GROW()` or similar.
> 
> So this is about naming?  And with "right thing" you mean failing to
> grow should lead to destruction?

Yup, exactly. I just want to give callers a better indicator which of
these functions does what, and having sane defaults helps in my opinion.

I guess this also depends on whether or not we want to eventually adapt
all callsites to handle allocation errors themselves.

Patrick