Re: [PATCH 06/10] global: fix unsigned integer promotions in ternary statements

Jeff King <peff@xxxxxxxx> · Sun, 1 Dec 2024 16:59:11 -0500

On Fri, Nov 29, 2024 at 02:13:27PM +0100, Patrick Steinhardt wrote:

> diff --git a/builtin/blame.c b/builtin/blame.c
> index f0d209791e44025b1965cd447cf4fc1e2ca5f009..6c6b0c7ef1a4d992064c7664bbf1229ef0286b97 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -470,7 +470,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>  
>  	for (cnt = 0; cnt < ent->num_lines; cnt++) {
>  		char ch;
> -		int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : abbrev;
> +		int length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
> +			cast_size_t_to_int(the_hash_algo->hexsz) : abbrev;

Hmm. I'm surprised that -Wsign-compare would trigger here. We are not
comparing, but assigning. I'd have thought the actual error is the
truncation from the size_t the_hash_algo->hexsz down to "length".

But the actual error from gcc is:

  builtin/blame.c:472:87: error: operand of ‘?:’ changes signedness from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} due to unsignedness of other operand [-Werror=sign-compare]
    472 |                 int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : abbrev;
        |                                                                                       ^~~~~~

So that makes sense that "abbrev" is promoted to unsigned to match the
other side, though I still think it's a little weird this comes via
-Wsign-compare.

Another solution would be to change "abbrev" into a size_t. But then
we'd still have truncation assigning to "length", unless we also make
that a size_t. But wouldn't that be the more natural type? We pass it to
memset() later.

We also subtract from it (without checking that it doesn't become
negative!), and use it with a printf("%.*s"). The latter does want an
int because of the lousy historical interface. IMHO we are probably
better off using fwrite() or strbuf_add() instead of "%.*s" specifiers.
In this case, I think it's just:

  fwrite(hex, 1, length, stdout);

(that assumes "length" is clamped to the hex size; I think it is here
but I also would not be opposed to a helper function that checks it
against the string length).

So I don't think what you've written above is _wrong_. But I think that
ultimately the right type here probably is size_t, and I worry that
sprinkling casts around makes it harder to see that. It converts what
would be a compile-time complaint (the truncation and sign conversion)
into a run-time one (that in this case I suspect can't be triggered, but
as a general rule may be something that _can_ be a problem but which our
tests are unlikely to actually poke at). I dunno.

I didn't dig carefully into the other ones, but I suspect they may be
similar. E.g.:

> diff --git a/builtin/patch-id.c b/builtin/patch-id.c
> index 56585757477911c96bbb9ef2cf3710691b8e744e..87fa586c4d552ba61cd2ac2cf079d68241eb3275 100644
> --- a/builtin/patch-id.c
> +++ b/builtin/patch-id.c
> @@ -163,7 +163,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
>  			after--;
>  
>  		/* Add line to hash algo (possibly removing whitespace) */
> -		len = verbatim ? strlen(line) : remove_space(line);
> +		len = verbatim ? cast_size_t_to_int(strlen(line)) : remove_space(line);
>  		patchlen += len;
>  		the_hash_algo->update_fn(&ctx, line, len);
>  	}

Here we are assigning the result of strlen() to an int, but I think it's
unbounded (it points to a strbuf we read from stdin). The cast helper
does at least mean we'll bail rather than continue on bogus input, but
consistently using size_t seems better still to me.

-Peff