Re: [RFC PATCH 6/6] utf8.c: avoid char overflow

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 9 Jul 2018 15:14:43 +0200 (DST)

Hi Beat,

On Sun, 8 Jul 2018, Beat Bolli wrote:

> In ISO C, char constants must be in the range -128..127. Change the BOM
> constants to unsigned char to avoid overflow.
> 
> Signed-off-by: Beat Bolli <dev+git@xxxxxxxxx>
> ---
>  utf8.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/utf8.c b/utf8.c
> index d55e20c641..833ce00617 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -561,15 +561,15 @@ char *reencode_string_len(const char *in, int insz,
>  #endif
>  
>  static int has_bom_prefix(const char *data, size_t len,
> -			  const char *bom, size_t bom_len)
> +			  const unsigned char *bom, size_t bom_len)
>  {
>  	return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
>  }
>  
> -static const char utf16_be_bom[] = {0xFE, 0xFF};
> -static const char utf16_le_bom[] = {0xFF, 0xFE};
> -static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
> -static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
> +static const unsigned char utf16_be_bom[] = {0xFE, 0xFF};
> +static const unsigned char utf16_le_bom[] = {0xFF, 0xFE};
> +static const unsigned char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
> +static const unsigned char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};

An alternative approach that might be easier to read (and avoids the
confusion arising from our use of (signed) chars for strings pretty much
everywhere):

#define FE ((char)0xfe)
#define FF ((char)0xff)

...

Ciao,
Dscho