> On 25 Feb 2018, at 04:52, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote: > > On Sat, Feb 24, 2018 at 11:27 AM, <lars.schneider@xxxxxxxxxxxx> wrote: >> If the endianness is not defined in the encoding name, then let's >> be strict and require a BOM to avoid any encoding confusion. The >> is_missing_required_utf_bom() function returns true if a required BOM >> is missing. >> >> The Unicode standard instructs to assume big-endian if there in no BOM >> for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used >> in HTML5 recommends to assume little-endian to "deal with deployed >> content" [3]. Strictly requiring a BOM seems to be the safest option >> for content in Git. >> >> Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx> >> --- >> diff --git a/utf8.h b/utf8.h >> @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid >> +/* >> + * If the endianness is not defined in the encoding name, then we >> + * require a BOM. The function returns true if a required BOM is missing. >> + * >> + * The Unicode standard instructs to assume big-endian if there >> + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG >> + * encoding standard used in HTML5 recommends to assume >> + * little-endian to "deal with deployed content" [3]. > > Perhaps you could tack on to the comment here the final bit of > explanation from the commit message which ties these conflicting > recommendations together. In particular: > > Therefore, strictly requiring a BOM seems to be the > safest option for content in Git. Agreed. I'll change it. Thanks, Lars