We've recently fielded several reports from unhappy Windows users about our handling of UTF-16, UTF-16LE, and UTF-16BE, none of which seem to be suitable for certain Windows programs. In an effort to communicate the reasons for our behavior more effectively, explain in the documentation that the UTF-16 variant that people have been asking for hasn't been standardized, and therefore hasn't been implemented in iconv(3). Mention what each of the variants do, so that people can make a decision which one meets their needs the best. In addition, add a comment in the code about why we must, for correctness reasons, reject a UTF-16LE or UTF-16BE sequence that begins with U+FEFF, namely that such a codepoint semantically represents a ZWNBSP, not a BOM, but that that codepoint at the beginning of a UTF-8 sequence (as encoded in the object store) would be misinterpreted as a BOM instead. This comment is in the code because I think it needs to be somewhere, but I'm not sure the documentation is the right place for it. If desired, I can add it to the documentation, although I feel the lurid details are not interesting to most users. If the wording is confusing, I'm very open to hearing suggestions for how to improve it. I don't use Windows, so I don't know what MSVCRT does. If it requires a BOM but doesn't accept big-endian encoding, then perhaps we should report that as a bug to Microsoft so it can be fixed in a future version. That would probably make a lot more programs work right out of the box and dramatically improve the user experience. As a note, I'm currently on vacation through the 2nd, so my responses may be slightly delayed. brian m. carlson (2): Documentation: document UTF-16-related behavior utf8: add comment explaining why BOMs are rejected Documentation/gitattributes.txt | 5 +++++ utf8.c | 7 +++++++ 2 files changed, 12 insertions(+)