On Tue, Jan 16, 2024 at 02:06:47AM +0000, brian m. carlson wrote: > On 2024-01-16 at 00:19:20, Michael Litwak wrote: > > As for documentation clarifications for the .gitattributes manpage at > > https://git-scm.com/docs/gitattributes, I still suggest adding an > > explicit example for UTF-16LE with BOM, and/or adding a table listing > > which working-tree-encoding value to use for each of the following > > UTF-16 text encodings: > > > > ENCODING 'working-tree-encoding' VALUE > > ------------------- ----------------------------- > > UTF-16LE with BOM UTF-16LE-BOM > > I should point out that this encoding, while very common on Windows, is > also nonstandard. In general, I agree with everything that is snipped, thanks for the ong wordings. [] > (Apparently Emacs, which is not on my system, may > permit that, which does not surprise me in the least.) emacs seems to handle UTF-16LE-BOM just fine. > > > UTF-16BE with BOM UTF-16 > [] > I think the addition of this table is too much. UTF-16LE-BOM is common > on Windows, and the rest are substantially less common. It's also very > difficult to explain in a table what "UTF-16" means in an understandable > way. And I also think it's also pretty clear that users should be using > UTF-8 without BOM where possible. > > We do already mention both UTF-16, UTF-16LE, and UTF-16LE-BOM as options > in the gitattributes manual page, and it's up to the user to know what > their program wants and supports if that's not UTF-8. What exactly is missing in the documentation ? Could you please try to send us a diff (or even better a patch), so that we can get an idea, of what can be improved ? >From my reading UTF-16LE-BOM is already mentioned. It would be nice to see (from a user), what is probably missing. > > Finally, I am not sure how to use git add --renormalize to correct a > > UTF-16 file that was previously added incorrectly (i.e. with a missing > > or incorrect working-tree-encoding entry in .gitattributes). The git > > add documentation at https://git-scm.com/docs/git-add implies > > 'renormalize' resets only the end-of-line values; however, I suspect > > it also re-converts text encoding when a working-tree-encoding > > property is set. It would be helpful to know one way or the other. > > It does indeed affect the working-tree-encoding. If you wanted to send > an inline patch created with git format-patch, it would probably be > welcome to mention that. However, because in this project we typically > scratch our own itch, if you don't send one, it's likely nobody else > will, either. For the record: It will even run the "clean" filter, if it has changed, or being freshly enabled. So yes, a patch would be appreciated. Thanks for bringing this up.