Re: Handling text files encoded in little-endian UTF-16 with BOM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 05, 2019 at 01:35:13PM +0200, Mateusz Loskot wrote:
> Hi,
>
> Using git version 2.22.0.windows.1
>
> I have a repository with number of .txt files encoded in
> little-endian UTF-16 with BOM.
>
> What are the best practice and recommended configuration to
> manage such files with Git to avoid unexpected re-encoding to
> UTF-8 or others?
>
> Currently, there is .gitattriuts with entries like
>
>    resource/*.txt   working-tree-encoding=UTF-16LE-BOM -text
>
> Despite that some of team members have noticed that the files
> occacionally get re-encoded to UTF-8. It is unknow what are
> actual steps leading to that. BTW, there a few Git clients
> in use: git in Git Bash, VSCode, Fork.

If possible, I don't want to comment on this kind of
"sometimes something happens something on someones computer" thing.
A little bit more information could be helpful.

>
> What bothers me in the .gitattributes is this `-text` attribute.
>
> Is the use of `working-tree-encoding` and `-text` together a
> valid combination at all?

Yes, it means that the content re-encoded between the repo and the working tree,
(that is what you want)
And the "-text" means "leave the line endings" (LF or CRLF) as is, don't convert them.

In that sense you can call that a legal combination, but may be not a recommended one.

>
> The documentation at https://git-scm.com/docs/gitattributes
> does not seem to touch on that.
>
> I'll appreciate any suggestions on those UTF-16LE-BOM files.
>

My suggestion would be to use the "text" attribute:
  resource/*.txt   working-tree-encoding=UTF-16LE-BOM text

And depending on your application: Do the resource files need a special line ending ?
The use either
  resource/*.txt   working-tree-encoding=UTF-16LE-BOM text eol=LF
or
  resource/*.txt   working-tree-encoding=UTF-16LE-BOM text eol=CRLF

I hope that helps a little bit.

> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux