> On 30 Jan 2018, at 20:15, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > tboegi@xxxxxx writes: > >> From: Lars Schneider <larsxschneider@xxxxxxxxx> >> >> If the endianness is not defined in the encoding name, then let's >> be strict and require a BOM to avoid any encoding confusion. The >> has_missing_utf_bom() function returns true if a required BOM is >> missing. >> >> The Unicode standard instructs to assume big-endian if there in no BOM >> for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used >> in HTML5 recommends to assume little-endian to "deal with deployed >> content" [3]. Strictly requiring a BOM seems to be the safest option >> for content in Git. > > I do not have strong opinion on encoding such policy-ish behaviour > as our default, but am I alone to find that "has missing X" is a > confusing name for a helper function? "is missing X" (or "lacks > X") is a bit more understandable, I guess. That might be a german/english translation thingy but I think I get your point. "has" implies there is something and "missing" implies there is nothing :) "is_missing_utf_bom()" might be even a bit unspecific as UTF-8 is usually missing a UTF BOM but the function would still return "false". Therefore, "is_missing_required_utf_bom()" might be lengthy but should fit. OK for you? - Lars > >> +int has_missing_utf_bom(const char *enc, const char *data, size_t len) >> +{ >> + return ( >> + !strcmp(enc, "UTF-16") && >> + !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || >> + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) >> + ) || ( >> + !strcmp(enc, "UTF-32") && >> + !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || >> + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) >> + ); >> +}