On 08.02.19 07:04, Rich Felker wrote: > On Fri, Feb 08, 2019 at 12:17:05AM +0000, brian m. carlson wrote: [] >> Even if Git were to produce a BOM to work around this issue, then we'd >> still have the problem that any program using musl will write data in >> UTF-16 without a BOM. Moreover, because musl, in violation of the RFC, >> doesn't read and process BOMs, someone using little-endian UTF-16 (with >> a proper BOM) with musl and Git will have their data corrupted, >> according to my reading of the musl website. > > That information is outdated and someone from our side should update > it; since 1.1.19, musl treats "UTF-16" input as ambiguous endianness > determined by BOM, defaulting to big if there's no BOM. However output > is always big endian, such that processes conforming to the Unicode > SHOULD clause will interpret it correctly. > > The portable way to get little endian with a BOM is to open a > conversion descriptor for "UTF-16LE" (which should not add any BOM) > and write a BOM manually. > That is possible in the next upcoming version of Git: commit 0fa3cc77ee9fb3b6bb53c73688c9b7500f996b83 Merge: cfd9167c15 aab2a1ae48 Author: Junio C Hamano <gitster@xxxxxxxxx> Date: Wed Feb 6 22:05:21 2019 -0800 Merge branch 'tb/utf-16-le-with-explicit-bom' A new encoding UTF-16LE-BOM has been invented to force encoding to UTF-16 with BOM in little endian byte order, which cannot be directly generated by using iconv. * tb/utf-16-le-with-explicit-bom: Support working-tree-encoding "UTF-16LE-BOM"