Re: [PATCH 06/10] global: fix unsigned integer promotions in ternary statements

Patrick Steinhardt <ps@xxxxxx> · Mon, 2 Dec 2024 08:54:13 +0100

On Sun, Dec 01, 2024 at 04:59:11PM -0500, Jeff King wrote:
> On Fri, Nov 29, 2024 at 02:13:27PM +0100, Patrick Steinhardt wrote:
> 
> > diff --git a/builtin/blame.c b/builtin/blame.c
> > index f0d209791e44025b1965cd447cf4fc1e2ca5f009..6c6b0c7ef1a4d992064c7664bbf1229ef0286b97 100644
> > --- a/builtin/blame.c
> > +++ b/builtin/blame.c
> > @@ -470,7 +470,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> >  
> >  	for (cnt = 0; cnt < ent->num_lines; cnt++) {
> >  		char ch;
> > -		int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : abbrev;
> > +		int length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
> > +			cast_size_t_to_int(the_hash_algo->hexsz) : abbrev;
> 
> Hmm. I'm surprised that -Wsign-compare would trigger here. We are not
> comparing, but assigning. I'd have thought the actual error is the
> truncation from the size_t the_hash_algo->hexsz down to "length".
> 
> But the actual error from gcc is:
> 
>   builtin/blame.c:472:87: error: operand of ‘?:’ changes signedness from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} due to unsignedness of other operand [-Werror=sign-compare]
>     472 |                 int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? the_hash_algo->hexsz : abbrev;
>         |                                                                                       ^~~~~~
> 
> So that makes sense that "abbrev" is promoted to unsigned to match the
> other side, though I still think it's a little weird this comes via
> -Wsign-compare.

Agreed, I was caught by surprise, as well. Doubly so because Clang does
not throw these into the same bag.

> Another solution would be to change "abbrev" into a size_t. But then
> we'd still have truncation assigning to "length", unless we also make
> that a size_t. But wouldn't that be the more natural type? We pass it to
> memset() later.
> 
> We also subtract from it (without checking that it doesn't become
> negative!), and use it with a printf("%.*s").

This is fine in practice because `abbrev` will never be smaller than
`MINIMUM_ABBREV` here, which is 4. So given that we only subtract at
most 3 from the value the end result would be a positive integer.

But you're right, this feels fragile overall.

> The latter does want an
> int because of the lousy historical interface. IMHO we are probably
> better off using fwrite() or strbuf_add() instead of "%.*s" specifiers.
> In this case, I think it's just:
> 
>   fwrite(hex, 1, length, stdout);
> 
> (that assumes "length" is clamped to the hex size; I think it is here
> but I also would not be opposed to a helper function that checks it
> against the string length).
> 
> 
> So I don't think what you've written above is _wrong_. But I think that
> ultimately the right type here probably is size_t, and I worry that
> sprinkling casts around makes it harder to see that. It converts what
> would be a compile-time complaint (the truncation and sign conversion)
> into a run-time one (that in this case I suspect can't be triggered, but
> as a general rule may be something that _can_ be a problem but which our
> tests are unlikely to actually poke at). I dunno.
> 
> I didn't dig carefully into the other ones, but I suspect they may be
> similar. E.g.:

Will adapt. For the first iteration I was admittedly a bit lazy for some
of the cases because I first wanted to check whether this will get
acceptance in the first place. I'll explode these out into separate
commits.

Patrick