Re: help request: unable to merge UTF-16-LE "text" file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-04-19 at 19:36:19, Kevin Long wrote:
> Greetings,
> 
> I've been struggling to merge branches because of a UTF-16-LE (with BOM?) file.
> 
> Windows 11 / git version 2.35.3.windows.1
> 
> The problem file is a .sln file (Visual Studio "solution"). Edited in
> both branches. It is a "text" file, but is encoded as such:
> 
> FacilityMaster.sln: Unicode text, UTF-16, little-endian text, with
> CRLF line terminators

Git does not consider files using UTF-16 to be text because they contain
NUL bytes.  In some sense they do represent textual content, but Git
considers them to be binary.

> I have tried several "working-tree-encoding" settings in
> .gitattributes in my local working directory, to no avail yet:
> 
> *.sln working-tree-encoding=UTF-16-LE eol=CRLF, results in:
> error: failed to encode 'FacilityMaster.sln' from UTF-16-LE to UTF-8
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)
> 
> *.sln working-tree-encoding=UTF-16 eol=CRLF, results in:
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)
> 
> *.sln working-tree-encoding=UTF-16-LE-BOM eol=CRLF
> error: failed to encode 'FacilityMaster.sln' from UTF-16-LE-BOM to UTF-8
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)

The proper encoding you want here is "UTF-16LE-BOM".  Many Windows
programs use a non-standard encoding where everything _must_ be both
little-endian and have a BOM.  (The standard encoding UTF-16LE must
always be little endian but omits the BOM, and UTF-16 could be of either
endianness, and must only contain a BOM if little endian, but could in
either case.)

That will result in the file being stored as UTF-8 in the repository and
converted to this non-standard Windows encoding on checkout.  However,
if you have already checked the file in without an appropriate
working-tree-encoding, you should run `git add --renormalize .` and then
commit.  You'll need to do that (or merge in a commit that does that) on
every branch you want to work with.

> Hoping for some suggestions. I've also tried to save the file as UTF-8
> in both branches, commit, then merge, but still that did not work. I
> just want to merge it like a normal source code file.

However, in order for the merge to work, both branches must have the
file checked in correctly.  That is, both master and the branch from
which you're merging need to have the file as UTF-8 in the repository.
If you make the working-tree-encoding changes above (or the switch to
UTF-8) on only one of those branches, then the other one will still have
the binary blob, and merging won't be possible.

If you can keep it as UTF-8, that's ideal.  It should definitely work if
both sides have UTF-8 files.  If you still see a message about binary
files, then it could well be that something didn't get saved properly as
UTF-8, or that these really aren't text files and that they contain
binary contents.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux