Re: [git-for-windows] How is detected binary files?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 27, 2015 at 03:14:58PM +0100, Johannes Schindelin wrote:

> On Wed, 25 Nov 2015, Andrzej Borucki wrote:
> 
> > How git detects that file is binary? This must be safe because it not 
> > allowed to change line breaks in binary files. 
> > Binary files can contain byte 0 (zero), but:
> > - 16 bit UTF also can contain zero
> > - short binary files can not contain zero
> 
> It would probably be better to direct this question to the general Git
> mailing list (you reached the Git for Windows one, and this issue is not
> specific to Windows).
> 
> To answer your question, a NUL byte within the first 8000 bytes is indeed
> considered as an indicator for binary files.
> 
> If you use UTF-16, you will need to mark your files as such explicitly
> (Git does not handle UTF-16 internally).

I'm not sure if it is a good idea to treat UTF-16 as text. The rest of
the diff (headers, etc) will all be in ASCII, so one or the other is
going to be mojibake.

You can get readable diffs by textconv-ing them to an ASCII-superset
encoding like UTF-8. Something like:

    echo 'myfile diff=utf16' >.gitattributes
    git config diff.utf16.textconv 'iconv -f utf16 -t utf8'

but of course the resulting patches cannot be applied, and you may miss
any changes that do not make it through the encoding (e.g., using
different bytes to represent the same code point).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]