Re: [PATCH v8 4/7] utf8: add function to detect a missing UTF-16/32 BOM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 25 Feb 2018, at 04:52, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
> 
> On Sat, Feb 24, 2018 at 11:27 AM,  <lars.schneider@xxxxxxxxxxxx> wrote:
>> If the endianness is not defined in the encoding name, then let's
>> be strict and require a BOM to avoid any encoding confusion. The
>> is_missing_required_utf_bom() function returns true if a required BOM
>> is missing.
>> 
>> The Unicode standard instructs to assume big-endian if there in no BOM
>> for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
>> in HTML5 recommends to assume little-endian to "deal with deployed
>> content" [3]. Strictly requiring a BOM seems to be the safest option
>> for content in Git.
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx>
>> ---
>> diff --git a/utf8.h b/utf8.h
>> @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid
>> +/*
>> + * If the endianness is not defined in the encoding name, then we
>> + * require a BOM. The function returns true if a required BOM is missing.
>> + *
>> + * The Unicode standard instructs to assume big-endian if there
>> + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG
>> + * encoding standard used in HTML5 recommends to assume
>> + * little-endian to "deal with deployed content" [3].
> 
> Perhaps you could tack on to the comment here the final bit of
> explanation from the commit message which ties these conflicting
> recommendations together. In particular:
> 
>    Therefore, strictly requiring a BOM seems to be the
>    safest option for content in Git.

Agreed. I'll change it.

Thanks,
Lars



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux