On Wed, Nov 07, 2018 at 05:38:18AM +0100, Adrián Gimeno Balaguer wrote: > Hello Torsten, > > Thanks for answering. > > Answering to your question, I removed the comments with "rebase" since > my reported encoding issue happens on more simpler operations > (described in the PR), and the problem is not directly related to > rebasing, so I considered it better in order to avoid unrelated > confusions. > > Let's get back to the problem. Each system has a default endianness. > Also, in .gitattributes's working-tree-encoding, Git behaves > differently depending on the attribute's value and the contents of the > referenced entry file. When I put the value "UTF-16", then the file > must have a BOM, or Git complains. Otherwise, if I put the value > "UTF-16BE" or "UTF-16LE", then Git prohibites operations if file has a > BOM for that main encoding (UTF-16 here), which can be relate to any > endianness. > > My very initial goal was, given a UTF-16LE file, to be able to view > human-readable diffs whenever I make a change on it (and yes, it must > be Little Endian). Plus, this file had a BOM. Now, what are the > options with Git currently (consider only working-tree-encoding)? If I > put working-tree-encoding=UTF-16, then I could view readable diffs and > commit the file, but here is the main problem: Git looses information > about what initial endianness the file had, therefore, after > staging/committing it re-encodes the file from UTF-8 (as stored > internally) to UTF-16 and the default system endianness. In my case it > did to Big Endian, thus affecting the project's requirement. That is > why I ended up writing a fixup script to change the encoding back to > UTF-16LE. OK, I think I understand your problem now. The file format which you ask for could be named "UTF-16-BOM-LE", but that does not exist in reality. If you use UTF-16, then there must be a BOM, and if there is a BOM, then a Unicode-aware application -should- be able to handle it. Why does your project require such a format ? > > On the other hand, once I set working-tree-encoding=UTF-16LE, then Git > prohibited me from committing the file and even viewing human-readable > diffs (the output simply tells it's a binary file). In this sense, the > internal location of these errors is within the function of utf8.c I > made changes to in the PR. I hope I was clearer! > > Finally, Git behaviour around this is based on Unicode standards, > which is why I acknowledged that my changes violated them after > refering to a link which is present in the ut8.h file. []