Re: [PATCH] khash: clarify that allocations never fail

René Scharfe <l.s.r@xxxxxx> · Sat, 3 Jul 2021 14:57:00 +0200

Am 03.07.21 um 12:38 schrieb Jeff King:
> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>
>> We use our standard allocation functions and macros (xcalloc,
>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>> the program on error, so code that's using them doesn't have to handle
>> allocation failures.  Make this behavior explicit by replacing the code
>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>
> Seems like a good idea.
>
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.
>
> Two nits if we wanted to go further:
>
>> diff --git a/khash.h b/khash.h
>> index 21c2095216..84ff7230b6 100644
>> --- a/khash.h
>> +++ b/khash.h
>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>> -				if (!new_flags) return -1;								\
>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>
> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
> 2016-02-22), but left the now-obsolete error-check.
>
> But a few lines below...
>
>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>
> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
> of arrays, 2014-09-16).
>
> Should we make the two match? I'd probably do so by making the former
> match the latter, and just drop the conditional and BUG entirely.

Yeah, makes sense, thank you.

>
>> @@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
>>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>>  			if (h->n_buckets > (h->size<<1)) {							\
>>  				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
>> -					*ret = -1; return h->n_buckets;						\
>> +					BUG("kh_resize_" #name " failed");					\
>>  				}														\
>>  			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
>> -				*ret = -1; return h->n_buckets;							\
>> +				BUG("kh_resize_" #name " failed");						\
>
> After the first hunk, does kh_resize_*() ever return anything but 0? If
> not, can we drop its return entirely, making it more clear that it's not
> expected to fail? Both for human readers, but also for the compiler
> (which could then alert us at compile-time if we missed any error
> cases).

Good idea.  Both type of changes make syncing with upstream a bit
harder, but even though the return type change bleeds into the caller,
the overall change affects only a small area.

René