Am 09.07.2018 16:48, schrieb Beat Bolli:
Hi Dscho
Am 09.07.2018 15:14, schrieb Johannes Schindelin:
Hi Beat,
On Sun, 8 Jul 2018, Beat Bolli wrote:
In ISO C, char constants must be in the range -128..127. Change the
BOM
constants to unsigned char to avoid overflow.
Signed-off-by: Beat Bolli <dev+git@xxxxxxxxx>
---
utf8.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/utf8.c b/utf8.c
index d55e20c641..833ce00617 100644
--- a/utf8.c
+++ b/utf8.c
@@ -561,15 +561,15 @@ char *reencode_string_len(const char *in, int
insz,
#endif
static int has_bom_prefix(const char *data, size_t len,
- const char *bom, size_t bom_len)
+ const unsigned char *bom, size_t bom_len)
{
return data && bom && (len >= bom_len) && !memcmp(data, bom,
bom_len);
}
-static const char utf16_be_bom[] = {0xFE, 0xFF};
-static const char utf16_le_bom[] = {0xFF, 0xFE};
-static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
-static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+static const unsigned char utf16_be_bom[] = {0xFE, 0xFF};
+static const unsigned char utf16_le_bom[] = {0xFF, 0xFE};
+static const unsigned char utf32_be_bom[] = {0x00, 0x00, 0xFE,
0xFF};
+static const unsigned char utf32_le_bom[] = {0xFF, 0xFE, 0x00,
0x00};
An alternative approach that might be easier to read (and avoids the
confusion arising from our use of (signed) chars for strings pretty
much
everywhere):
#define FE ((char)0xfe)
#define FF ((char)0xff)
...
I have tried this first (without the macros, though), and thought it
looked
really ugly. That's why I chose this solution. The usage is pretty
local and
close to function has_bom_prefix().
Would an explaining comment help?
I have found an even simpler solution. Use proper char literals.
I will put this into v2.
Regards,
Beat
diff --git a/utf8.c b/utf8.c
index d55e20c641..982217eec9 100644
--- a/utf8.c
+++ b/utf8.c
@@ -566,10 +566,10 @@ static int has_bom_prefix(const char *data, size_t
len,
return data && bom && (len >= bom_len) && !memcmp(data, bom,
bom_len);
}
-static const char utf16_be_bom[] = {0xFE, 0xFF};
-static const char utf16_le_bom[] = {0xFF, 0xFE};
-static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
-static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+static const char utf16_be_bom[] = {'\xFE', '\xFF'};
+static const char utf16_le_bom[] = {'\xFF', '\xFE'};
+static const char utf32_be_bom[] = {'\0', '\0', '\xFE', '\xFF'};
+static const char utf32_le_bom[] = {'\xFF', '\xFE', '\0', '\0'};
int has_prohibited_utf_bom(const char *enc, const char *data, size_t
len)
{