Re: [GSOC][PATCH] apply: address -Wsign-comparison warnings

Zejun Zhao <jelly.zhao.42@xxxxxxxxx> · Wed, 5 Feb 2025 16:49:47 +0000

Thank you for reviewing this patch.

On Wed, Feb 05 2025 04:58:57 -0800, Junio C Hamano wrote,
> > @@ -1087,11 +1086,11 @@ static int gitdiff_index(struct gitdiff_data *state,
> >  	 * and optional space with octal mode.
> >  	 */
> >  	const char *ptr, *eol;
> > -	int len;
> > -	const unsigned hexsz = the_hash_algo->hexsz;
> > +	size_t len;
> > +	const size_t hexsz = the_hash_algo->hexsz;
> 
> I thought that I already saw this discussed in another thread.
> 
> The .hexsz of any hash algorithm would never be larger than what a
> platform natural "unsigned" integer type can hold, so using size_t
> for the member _is_ the wrong thing to do and the fix may be the
> other way around, no?

Sorry I didn't saw the comment at [1]. You are right about this. I 
prefer changing types of the `git_hash_algo::*sz` family to `unsigned`.

[1] https://lore.kernel.org/git/xmqqttaqw2eb.fsf@gitster.g/

> >  	ptr = strchr(line, '.');
> > -	if (!ptr || ptr[1] != '.' || hexsz < ptr - line)
> > +	if (!ptr || ptr[1] != '.' || hexsz < (size_t) (ptr - line))
> 
> Is this about -Wsign-compare complaining about size_t vs ptrdiff_t?
> 
> > @@ -1207,7 +1206,7 @@ static char *git_header_name(int p_value,
> >  		cp = skip_tree_prefix(p_value, second, line + llen - second);
> >  		if (!cp)
> >  			goto free_and_fail1;
> > -		if (line + llen - cp != first.len ||
> > +		if ((size_t) (line + llen - cp) != first.len ||
> 
> Ditto.
> 
> > -			if (len < second - name &&
> > +			if (len < (size_t) (second - name) &&
> 
> Ditto.

Yes it's comparing unsigned (`size_t`) with signed (`ptrdiff_t`).

> I said this before, but I am not sure if being strict about
> "-Wsign-compare" is really buying us much.  If we are getting so
> many false positives that need to be squelched by churning the code
> with so many typecasts in order to find a new and real problem,
> is it really worth it?

Well, I contributed most before to a Rust OS kernel project so 
honestly I'm used to the idea of explicit type casting when necessary 
to avoid possible bugs. However, this is non-trivial. People should 
understand each relevant variable's semantics to decide what its type 
should be and do typecast on top of that. Even if it's already made a 
gradual process, it may still cost much from both the contributors and 
the reviewers but brings benefits that do not match up. Maybe a clear 
commit message that states why each type cast/change makes sense can 
help with the iteration? I'm not quite sure about that.