Re: [RFC PATCH 6/6] utf8.c: avoid char overflow

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 9 Jul 2018 22:04:00 +0200 (DST)

Hi Beat,

On Mon, 9 Jul 2018, Beat Bolli wrote:

> Am 09.07.2018 15:14, schrieb Johannes Schindelin:
> > 
> > On Sun, 8 Jul 2018, Beat Bolli wrote:
> > 
> > > In ISO C, char constants must be in the range -128..127. Change the BOM
> > > constants to unsigned char to avoid overflow.
> > > 
> > > Signed-off-by: Beat Bolli <dev+git@xxxxxxxxx>
> > > ---
> > >  utf8.c | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/utf8.c b/utf8.c
> > > index d55e20c641..833ce00617 100644
> > > --- a/utf8.c
> > > +++ b/utf8.c
> > > @@ -561,15 +561,15 @@ char *reencode_string_len(const char *in, int insz,
> > >  #endif
> > > 
> > > static int has_bom_prefix(const char *data, size_t len,
> > > -			  const char *bom, size_t bom_len)
> > > +			  const unsigned char *bom, size_t bom_len)
> > >  {
> > > 	return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
> > >  }
> > > 
> > > -static const char utf16_be_bom[] = {0xFE, 0xFF};
> > > -static const char utf16_le_bom[] = {0xFF, 0xFE};
> > > -static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
> > > -static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
> > > +static const unsigned char utf16_be_bom[] = {0xFE, 0xFF};
> > > +static const unsigned char utf16_le_bom[] = {0xFF, 0xFE};
> > > +static const unsigned char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
> > > +static const unsigned char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
> > 
> > An alternative approach that might be easier to read (and avoids the
> > confusion arising from our use of (signed) chars for strings pretty much
> > everywhere):
> > 
> > #define FE ((char)0xfe)
> > #define FF ((char)0xff)
> > 
> > ...
> 
> I have tried this first (without the macros, though), and thought it looked
> really ugly.

Yep, I would totally agree that it would be very ugly without the macros.

Which is why I suggested the macros instead, in which case it looks
relatively elegant to my eyes.

Ciao,
Dscho