Re: [PATCH 1/4] reftable: avoid leaks on realloc error

René Scharfe <l.s.r@xxxxxx> · Fri, 27 Dec 2024 21:16:27 +0100

Am 27.12.24 um 11:33 schrieb Patrick Steinhardt:
> On Wed, Dec 25, 2024 at 07:38:29PM +0100, René Scharfe wrote:
>> When realloc(3) fails, it returns NULL and keeps the original allocation
>> intact.  REFTABLE_ALLOC_GROW overwrites both the original pointer and
>> the allocation count variable in that case, simultaneously leaking the
>> original allocation and misrepresenting the number of storable items.
>>
>> parse_names() and reftable_buf_add() avoid leaking by restoring the
>> original pointer value on failure, but all other callers seem to be OK
>> with losing the old allocation.  Add a new variant of the macro,
>> REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the
>> allocation counter.  Use it for those callers.
>
> Hm, okay. I find it a bit curious to discern those two macros from each
> other as all callers need to handle OOM errors anyway, so doing the safe
> thing should likely be our default here and all callsites that don't
> should be adapted, shouldn't they?

I agree, and I my first version only had REFTABLE_ALLOC_GROW.  Keeping
stuff unchanged if we cannot grow should be safer, right?  But it would
introduce a leak if the caller exits without cleaning up, so each of
them needs to be audited.  I was too lazy for that.  And it's work that
can be parallelized..

> In the case of `reftable_buf_add()` I kind of doubt the usefulness of
> handling the error just to keep the old pointer intact, as all callsites
> will ultimately error out anyway.

I can imagine use cases where an object is built piece by piece, one
part is too large and then you still want to keep all the rest and just
replace the huge thing with a placeholder or entirely ignore it.  Could
be a case of YAGNI, though.

> But in the case of `parse_names()` we
> do in fact want to handle the case specially so that we can free any
> names we have already parsed, so that case makes sense indeed.

Yes.  But that leads me on a tangent: Is it really a good idea to load
a file into lots of individual string objects instead of loading into
a single big buffer and pointing directly into it?  Do those strings
need to have individual lifetimes?

> So there is merit in having two separate wrappers, but it would be nice
> if `REFTABLE_ALLOC_GROW()` would be doing the "right thing" for most
> cases while the above two callsites would be adapted to use a wrapper
> that requires a bit more thought to use correctly. For example something
> like `REFTABLE_TRY_ALLOC_GROW()` or similar.

So this is about naming?  And with "right thing" you mean failing to
grow should lead to destruction?

René