Re: [RFC PATCH 6/6] utf8.c: avoid char overflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 09.07.2018 16:48, schrieb Beat Bolli:
Hi Dscho

Am 09.07.2018 15:14, schrieb Johannes Schindelin:
Hi Beat,

On Sun, 8 Jul 2018, Beat Bolli wrote:

In ISO C, char constants must be in the range -128..127. Change the BOM
constants to unsigned char to avoid overflow.

Signed-off-by: Beat Bolli <dev+git@xxxxxxxxx>
---
 utf8.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/utf8.c b/utf8.c
index d55e20c641..833ce00617 100644
--- a/utf8.c
+++ b/utf8.c
@@ -561,15 +561,15 @@ char *reencode_string_len(const char *in, int insz,
 #endif

 static int has_bom_prefix(const char *data, size_t len,
-			  const char *bom, size_t bom_len)
+			  const unsigned char *bom, size_t bom_len)
 {
return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
 }

-static const char utf16_be_bom[] = {0xFE, 0xFF};
-static const char utf16_le_bom[] = {0xFF, 0xFE};
-static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
-static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+static const unsigned char utf16_be_bom[] = {0xFE, 0xFF};
+static const unsigned char utf16_le_bom[] = {0xFF, 0xFE};
+static const unsigned char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF}; +static const unsigned char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};

An alternative approach that might be easier to read (and avoids the
confusion arising from our use of (signed) chars for strings pretty much
everywhere):

#define FE ((char)0xfe)
#define FF ((char)0xff)

...

I have tried this first (without the macros, though), and thought it looked really ugly. That's why I chose this solution. The usage is pretty local and
close to function has_bom_prefix().

Would an explaining comment help?

I have found an even simpler solution. Use proper char literals.

I will put this into v2.

Regards,
Beat


diff --git a/utf8.c b/utf8.c
index d55e20c641..982217eec9 100644
--- a/utf8.c
+++ b/utf8.c
@@ -566,10 +566,10 @@ static int has_bom_prefix(const char *data, size_t len, return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
 }

-static const char utf16_be_bom[] = {0xFE, 0xFF};
-static const char utf16_le_bom[] = {0xFF, 0xFE};
-static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
-static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+static const char utf16_be_bom[] = {'\xFE', '\xFF'};
+static const char utf16_le_bom[] = {'\xFF', '\xFE'};
+static const char utf32_be_bom[] = {'\0', '\0', '\xFE', '\xFF'};
+static const char utf32_le_bom[] = {'\xFF', '\xFE', '\0', '\0'};

int has_prohibited_utf_bom(const char *enc, const char *data, size_t len)
 {



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux