> -----Original Message----- > From: git-owner@xxxxxxxxxxxxxxx [mailto:git-owner@xxxxxxxxxxxxxxx] On > Behalf Of Torsten Bogershausen > Sent: Thursday, November 8, 2018 9:03 AM > To: Adrián Gimeno Balaguer > Cc: git@xxxxxxxxxxxxxxx > Subject: Re: git-rebase is ignoring working-tree-encoding > > On Wed, Nov 07, 2018 at 05:38:18AM +0100, Adrián Gimeno Balaguer wrote: > > Hello Torsten, > > > > Thanks for answering. > > > > Answering to your question, I removed the comments with "rebase" since > > my reported encoding issue happens on more simpler operations > > (described in the PR), and the problem is not directly related to > > rebasing, so I considered it better in order to avoid unrelated > > confusions. > > > OK, I think I understand your problem now. > The file format which you ask for could be named "UTF-16-BOM-LE", > but that does not exist in reality. > If you use UTF-16, then there must be a BOM, and if there is a BOM, > then a Unicode-aware application -should- be able to handle it. > > Why does your project require such a format ? > Many tools in Windows still do not understand UTF-8, although it's getting better. I think Windows is about the only OS where tools still require UTF-16 for full internationalization. Many tools written in C use MSVC RTL, where fopen(), unfortunately, doesn't understand UTF-16BE (though such a rudimentary program as Notepad does). For this reason, it's very reasonable to ask that the programming tools produce UTF-16 files with particular endianness, natural for the platform they're running on. The iconv programmers' boneheaded decision to always produce UTF-16BE with BOM for UTF-16 output doesn't make sense. Again, git and iconv/libiconv in Centos on x86 do the right thing and produce UTF-16LE with BOM in this case. Also, iconv/libiconv should not be rejecting files with BOM for input encoding UTF-16BE or UTF-16LE. The BOM is not some magic tag. It's just a zero-width space, with unique property that its 8 and 16 bit encoding variants can be recognized one from another. It can appear anywhere in a file. If it's a first character in the file, then the file encoding can be reliably detected. But it's just a character, and iconv should be accepting such files as valid.