Re: [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc

Phillip Wood <phillip.wood123@xxxxxxxxx> · Wed, 29 Jan 2025 16:53:08 +0000




Hi Sören

On 26/01/2025 13:00, Sören Krecker wrote:

-	const int tree_entry_len = the_hash_algo->hexsz + 5;
-	const int parent_entry_len = the_hash_algo->hexsz + 7;
+	const size_t tree_entry_len = the_hash_algo->hexsz + 5;
+	const size_t parent_entry_len = the_hash_algo->hexsz + 7;

As Junio has previously pointed out it might make more sense to change 
the type of the_hash_algo->hexsz. What is the advantage of using size_t 
here?

  int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
  {
-	int inspos, copypos;
+	ssize_t inspos, copypos;

Note that POSIX allows "ssize_t" to be narrower than "int". We have at 
least one platform where that is the case [see c14e5a1a501 
(transport-helper: use xread instead of read, 2019-01-03)] so I'm not 
sure this change is a good idea.

  	const char *eoh;
  	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
-	int gpg_sig_header_len = strlen(gpg_sig_header);
+	size_t gpg_sig_header_len = strlen(gpg_sig_header);

It is really unfortunate that the compiler cannot deduce that there is 
not any truncation here as the longest string is "gpgsig-sha256". I 
wonder if we should add a helper for cases like this something like

	int strlen_int(const char *s) {
		return cast_size_t_to_int(strlen(s));
	}

rather than having to deal with the fallout from changing "int" to "size_t"

-static int find_invalid_utf8(const char *buf, int len)
+static int find_invalid_utf8(const char *buf, size_t len)

I think changing the type of "len" is an improvement (even if it hard to 
see anyone creating such a long commit message) as it matches the 
buf->len in verify_utf8() which is the only caller of this function. 
However the conversion is incomplete. If we are to accommodate buffers 
longer than INT_MAX we need to change the return type as well. As it 
stands bad_offset is changed to size_t but truncated to int when the 
function returns. Patrick has already pointed out that the type of 
"offset" needs to match "len" as the loop does

	len--;
	offset++;

and

	bad_offset = offset - 1;

verify_utf8() uses a variable of type long to track the position so that 
should be changed as well if we're really going to support size_t length 
buffers.

I think that a good approach to fixing these warnings would be to ask 
"does it make sense to use size_t here?". If the answer is "yes" then we 
should focus on converting the function to work correctly with size_t 
rather than on fixing the compiler warnings. That way we are more likely 
to avoid subtle bugs like this as our focus is on the conversion rather 
than the compiler warnings which will be fixed as a by-product of the 
conversion. If the answer to the question is "no" then we should look to 
change the types of the other variable in the assignment or use 
something like cast_size_t_to_int() or case_size_t_to_ulong() as 
appropriate. There is a danger that we adopt an approach of "change the 
type to size_t to silence the warnings" which leads to code that looks 
like it handles size_t correctly but in fact contains subtle bugs.

Best Wishes

Phillip


  {
  	int offset = 0;
  	static const unsigned int max_codepoint[] = {
@@ -1539,7 +1539,7 @@ static int find_invalid_utf8(const char *buf, int len)
  
  	while (len) {
  		unsigned char c = *buf++;
-		int bytes, bad_offset;
+		size_t bytes, bad_offset;
  		unsigned int codepoint;
  		unsigned int min_val, max_val;