Re: [PATCH] khash: clarify that allocations never fail

Jeff King <peff@xxxxxxxx> · Sat, 3 Jul 2021 06:38:26 -0400

On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:

> We use our standard allocation functions and macros (xcalloc,
> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
> the program on error, so code that's using them doesn't have to handle
> allocation failures.  Make this behavior explicit by replacing the code
> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.

Seems like a good idea.

We're very sloppy about checking the "ret" field from kh_put_* for
errors (it's a tri-state for "already existed", "newly added", or
"error"). I think that's not a problem because as you show here, we
can't actually hit the error case. This makes that much more obvious.

Two nits if we wanted to go further:

> diff --git a/khash.h b/khash.h
> index 21c2095216..84ff7230b6 100644
> --- a/khash.h
> +++ b/khash.h
> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
> -				if (!new_flags) return -1;								\
> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\

I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
2016-02-22), but left the now-obsolete error-check.

But a few lines below...

>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>  					REALLOC_ARRAY(h->keys, new_n_buckets); \

These REALLOC_ARRAY() calls are in the same boat. You dropped the error
check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
of arrays, 2014-09-16).

Should we make the two match? I'd probably do so by making the former
match the latter, and just drop the conditional and BUG entirely.

> @@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>  			if (h->n_buckets > (h->size<<1)) {							\
>  				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
> -					*ret = -1; return h->n_buckets;						\
> +					BUG("kh_resize_" #name " failed");					\
>  				}														\
>  			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
> -				*ret = -1; return h->n_buckets;							\
> +				BUG("kh_resize_" #name " failed");						\

After the first hunk, does kh_resize_*() ever return anything but 0? If
not, can we drop its return entirely, making it more clear that it's not
expected to fail? Both for human readers, but also for the compiler
(which could then alert us at compile-time if we missed any error
cases).

-Peff