Re: [PATCH v8 3/7] utf8: add function to detect prohibited UTF-16/32 BOM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 24, 2018 at 11:27 AM,  <lars.schneider@xxxxxxxxxxxx> wrote:
> Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
> or UTF-32LE a BOM must not be used [1]. The function returns true if
> this is the case.
>
> [1] http://unicode.org/faq/utf_bom.html#bom10
>
> Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx>
> ---
> diff --git a/utf8.c b/utf8.c
> @@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz,
> +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len)
> +{
> +       return (
> +         (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) &&
> +         (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
> +          has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
> +       ) || (
> +         (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) &&
> +         (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
> +          has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
> +       );
> +}

Is this interpretation correct? When I read [1], I interpret it as
saying that no BOM _of any sort_ should be present when the encoding
is declared as one of UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. This
code, on the other hand, only checks for BOMs corresponding to the
declared size (16 or 32 bits).

I suppose the intention of [1] is to detect a mismatch between the
declared encoding and how the stream is actually encoded. The check
implemented here will fail to detect a mismatch between, say, declared
encoding UTF-16BE and actual encoding UTF-32BE.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux