On Tue, Feb 27, 2018 at 6:16 AM, Lars Schneider <larsxschneider@xxxxxxxxx> wrote: >> On 25 Feb 2018, at 08:15, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote: >> On Sat, Feb 24, 2018 at 11:27 AM, <lars.schneider@xxxxxxxxxxxx> wrote: >> The above paragraph is giving an example of the scenario described in >> the paragraph above it. It might, therefore, be clearer to start this >> paragraph with "For example,...". Also, as an aid to non-Windows >> developers, it might be helpful to introduce ".proj" files as >> "Microsoft Visual Studio `*.proj` files...". > > Agreed. How about this? > > For example, Microsoft Visual Studio resources files (`*.rc`) or > PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16. > If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with > a `working-tree-encoding` enabled Git client, then `foo.ps1` will be > stored as UTF-8 internally. A client without `working-tree-encoding` > support will checkout `foo.ps1` as UTF-8 encoded file. This will > typically cause trouble for the users of this file. Since "*.proj" is mentioned a number of times in paragraphs below this one, perhaps just stick with it instead of introducing "*.rc" and "*.ps1", which don't necessarily add value. Alternately, if you use .rc and .ps1, then change the remaining .proj references to follow suit. >>> diff --git a/convert.c b/convert.c >>> + } else if (is_missing_required_utf_bom(enc->name, src, src_len)) { >>> + const char *error_msg = _( >>> + "BOM is required in '%s' if encoded as %s"); >>> + const char *advise_msg = _( >>> + "The file '%s' is missing a byte order mark (BOM). " >>> + "Please use %sBE or %sLE (depending on the byte order) " >>> + "as working-tree-encoding."); >>> + advise(advise_msg, path, enc->name, enc->name); >>> + if (conv_flags & CONV_WRITE_OBJECT) >>> + die(error_msg, path, enc->name); >>> + else >>> + error(error_msg, path, enc->name); >>> + } >> >> For a function which presumably is agnostic about the working-tree >> encoding (supporting whatever iconv does), it nevertheless has >> unusually intimate knowledge about UTF BOMs, in general, and the >> internals of has_prohibited_utf_bom() and >> is_missing_required_utf_bom(), in particular. It might be cleaner, and >> would simplify this function, if all this UTF BOM-specific gunk was >> moved elsewhere and generalized into a "validate conversion" function. > > Agreed! How about this? > > /* > * If we convert to an UTF encoding, then check for common BOM errors. > */ > if (!memcmp("UTF-", enc, 4)) > if (has_utf_bom_error(path, enc, src, src_len, die_on_error)) > return 0; Better. The comment merely repeats the code, which is clear in its intent, thus doesn't really add value (but that's a minor point). I'd probably generalize it further to a "validate conversion" function (which itself would have this UTF-specific knowledge), but that's a matter of taste and can be done later when/if other types of validation requirements arise.