Re: [PATCHv2 1/4] add-patch: Fix type missmatch rom msvc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-01-06 at 19:08:52, Sören Krecker wrote:
> Fix some compiler warings from msvw in add-patch.c for value truncation
> form 64 bit to 32 bit integers.Change unsigned long to size_t for
> correct variable size on linux and windows.
> Add macro strtos for converting a string to size_t.
> Test if convertion fails with over or underflow.

A few minor nits here.  We want to say "from" both here and in the title
and "conversion" (and in the title, "mismatch"), and put a space after
the period in a sentence.  I think you meant "MSVC" instead of "msvw",
but if not, please do explain what that is, since I'm not familiar with
it and I'm curious.  The commit message is a good place to explain lots
in detail.

> Signed-off-by: Sören Krecker <soekkle@xxxxxxxxxx>
> 
> Uses strtouq

I don't see that we're using this function.

> impove linux support
> 
> Change Macro name

We don't typically put comments about the revisions we've made to a
patch in the commit message.  We may put them below the --- so that
they're visible to readers and reviewers, which is helpful, but we
pretend that our patches were perfect to begin with in terms of the
commit message, since the future reader of the history only cares about
the actual end result and not what changes we made along the way.

> ---
>  add-patch.c       | 53 +++++++++++++++++++++++++++--------------------
>  gettext.h         |  2 +-
>  git-compat-util.h |  6 ++++++
>  3 files changed, 38 insertions(+), 23 deletions(-)
> 
> diff --git a/add-patch.c b/add-patch.c
> index 7b598e14df..67a7f68d23 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -242,7 +242,7 @@ static struct patch_mode patch_mode_worktree_nothead = {
>  };
>  
>  struct hunk_header {
> -	unsigned long old_offset, old_count, new_offset, new_count;
> +	size_t old_offset, old_count, new_offset, new_count;
>  	/*
>  	 * Start/end offsets to the extra text after the second `@@` in the
>  	 * hunk header, e.g. the function signature. This is expected to
> @@ -322,11 +322,12 @@ static void setup_child_process(struct add_p_state *s,
>  }
>  
>  static int parse_range(const char **p,
> -		       unsigned long *offset, unsigned long *count)
> +		       size_t *offset, size_t *count)
>  {
>  	char *pend;
> -
> -	*offset = strtoul(*p, &pend, 10);
> +	*offset = strtos(*p, &pend, 10);

I see you've defined this below.

> +	if (errno == ERANGE)
> +		return error("Number dose not fit datatype");

I think the word you wanted was "does".  However, perhaps we should
provide a better, more meaningful error message so the user knows what
data they provided that was invalid.  Maybe "absurdly large value in
diff header range"?  It would be quite bizarre to get a value even as
large as the maximum value of a 32-bit integer, and I don't think our
diff code can even handle values larger than INT_MAX.

In that context, it might not even be necessary to handle values larger
than unsigned long, since we can't generate them.  However, in the
interests of compatibility with other implementations which might not
have that limitation, size_t seems reasonable as a choice to handle more
generally.

Assuming we keep this, we probably also want to mark this for
translation by wrapping it in `_(` and `)`.

I also don't think this order is correct.  In general, errno is not
reset implicitly, so unless we know that an error occurred, errno is
meaningless, since another function could have set it to ERANGE.  We'd
probably need to save errno, set it to 0, and restore to verify that we
got the right value, since we can't distinguish here between a truncated
value for range reasons and for other reasons.

>  	if (pend == *p)
>  		return -1;
>
>  	if (*pend != ',') {
> @@ -334,7 +335,9 @@ static int parse_range(const char **p,
>  		*p = pend;
>  		return 0;
>  	}
> -	*count = strtoul(pend + 1, (char **)p, 10);
> +	*count = strtos(pend + 1, (char **)p, 10);
> +	if (errno == ERANGE)
> +		return error("Number dose not fit datatype");

Same comment here.

>  	return *p == pend + 1 ? -1 : 0;
>  }
>  
> @@ -673,8 +676,8 @@ static void render_hunk(struct add_p_state *s, struct hunk *hunk,
>  		 */
>  		const char *p;
>  		size_t len;
> -		unsigned long old_offset = header->old_offset;
> -		unsigned long new_offset = header->new_offset;
> +		size_t old_offset = header->old_offset;
> +		size_t new_offset = header->new_offset;
>  
>  		if (!colored) {
>  			p = s->plain.buf + header->extra_start;
> @@ -700,12 +703,14 @@ static void render_hunk(struct add_p_state *s, struct hunk *hunk,
>  		else
>  			new_offset += delta;
>  
> -		strbuf_addf(out, "@@ -%lu", old_offset);
> +		strbuf_addf(out, "@@ -%" PRIuMAX, (uintmax_t)old_offset);
>  		if (header->old_count != 1)
> -			strbuf_addf(out, ",%lu", header->old_count);
> -		strbuf_addf(out, " +%lu", new_offset);
> +			strbuf_addf(out, ",%" PRIuMAX,
> +				    (uintmax_t)header->old_count);
> +		strbuf_addf(out, " +%" PRIuMAX, (uintmax_t)new_offset);

If we're using size_t, we can use %zu.  That's specified in C99 as the
appropriate formatting type for size_t, and we require C99 or C11 for
all systems.  We don't need to cast to uintmax_t.

> diff --git a/gettext.h b/gettext.h
> index 484cafa562..d36f5a7ade 100644
> --- a/gettext.h
> +++ b/gettext.h
> @@ -53,7 +53,7 @@ static inline FORMAT_PRESERVING(1) const char *_(const char *msgid)
>  }
>  
>  static inline FORMAT_PRESERVING(1) FORMAT_PRESERVING(2)
> -const char *Q_(const char *msgid, const char *plu, unsigned long n)
> +const char *Q_(const char *msgid, const char *plu, size_t n)
>  {
>  	if (!git_gettext_enabled)
>  		return n == 1 ? msgid : plu;
> diff --git a/git-compat-util.h b/git-compat-util.h
> index e283c46c6f..4c33990a05 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -291,6 +291,12 @@ static inline int _have_unix_sockets(void)
>  #ifdef HAVE_BSD_SYSCTL
>  #include <sys/sysctl.h>
>  #endif
> +#if defined _WIN64
> +# define strtos strtoull
> +#else
> +#define strtos strtoul
> +#endif

This is not a great name for the function.  First of all, it resembles
the standard functions a lot, so it's something that POSIX could
standardize or an OS could add, and then we'll have some fun compilation
errors when we redefine things.

Second, it's a lot less future-proof.  While I do agree that only
Windows 64-bit systems are likely to fall into this case, since we
already include <limits.h>, we probably should do this:

  #if SIZE_MAX == ULONG_MAX
  #define str_to_size_t strtoul
  #else
  #define str_to_size_t strtoull
  #endif

(or whatever you want to call the function).

That expresses what we care about—that the type is suitable for the
value we want to parse—and doesn't use the OS as a proxy for that data.
Otherwise, the Unix developer who doesn't use Windows may not
understand _why_ Windows is special and the reason we've chosen this
change.

On that note, it would be helpful if you explained in the commit message
why that is for people who don't know.  Maybe something like this:

  On 64-bit systems, size_t is a 64-bit type.  On most Unix systems,
  unsigned long is also 64 bits in size, so we can use functions for
  that type to parse values of size_t.  However, on Windows, unsigned
  long is always 32 bits, and if we want a 64-bit type, we must use
  unsigned long long.  To future-proof our changes against other
  platforms that might be added in the future, we first check if
  unsigned long is sufficient, and otherwise, use unsigned long long,
  which will work in both cases.

Of course, please feel free to edit as you see fit.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux