Hi Sören
On 26/01/2025 13:00, Sören Krecker wrote:
- const int tree_entry_len = the_hash_algo->hexsz + 5;
- const int parent_entry_len = the_hash_algo->hexsz + 7;
+ const size_t tree_entry_len = the_hash_algo->hexsz + 5;
+ const size_t parent_entry_len = the_hash_algo->hexsz + 7;
As Junio has previously pointed out it might make more sense to change
the type of the_hash_algo->hexsz. What is the advantage of using size_t
here?
int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
{
- int inspos, copypos;
+ ssize_t inspos, copypos;
Note that POSIX allows "ssize_t" to be narrower than "int". We have at
least one platform where that is the case [see c14e5a1a501
(transport-helper: use xread instead of read, 2019-01-03)] so I'm not
sure this change is a good idea.
const char *eoh;
const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
- int gpg_sig_header_len = strlen(gpg_sig_header);
+ size_t gpg_sig_header_len = strlen(gpg_sig_header);
It is really unfortunate that the compiler cannot deduce that there is
not any truncation here as the longest string is "gpgsig-sha256". I
wonder if we should add a helper for cases like this something like
int strlen_int(const char *s) {
return cast_size_t_to_int(strlen(s));
}
rather than having to deal with the fallout from changing "int" to "size_t"
-static int find_invalid_utf8(const char *buf, int len)
+static int find_invalid_utf8(const char *buf, size_t len)
I think changing the type of "len" is an improvement (even if it hard to
see anyone creating such a long commit message) as it matches the
buf->len in verify_utf8() which is the only caller of this function.
However the conversion is incomplete. If we are to accommodate buffers
longer than INT_MAX we need to change the return type as well. As it
stands bad_offset is changed to size_t but truncated to int when the
function returns. Patrick has already pointed out that the type of
"offset" needs to match "len" as the loop does
len--;
offset++;
and
bad_offset = offset - 1;
verify_utf8() uses a variable of type long to track the position so that
should be changed as well if we're really going to support size_t length
buffers.
I think that a good approach to fixing these warnings would be to ask
"does it make sense to use size_t here?". If the answer is "yes" then we
should focus on converting the function to work correctly with size_t
rather than on fixing the compiler warnings. That way we are more likely
to avoid subtle bugs like this as our focus is on the conversion rather
than the compiler warnings which will be fixed as a by-product of the
conversion. If the answer to the question is "no" then we should look to
change the types of the other variable in the assignment or use
something like cast_size_t_to_int() or case_size_t_to_ulong() as
appropriate. There is a danger that we adopt an approach of "change the
type to size_t to silence the warnings" which leads to code that looks
like it handles size_t correctly but in fact contains subtle bugs.
Best Wishes
Phillip
{
int offset = 0;
static const unsigned int max_codepoint[] = {
@@ -1539,7 +1539,7 @@ static int find_invalid_utf8(const char *buf, int len)
while (len) {
unsigned char c = *buf++;
- int bytes, bad_offset;
+ size_t bytes, bad_offset;
unsigned int codepoint;
unsigned int min_val, max_val;