Re: [PATCH] treat any file with NUL as binary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Jan 15, 2008, at 3:28 PM, Dmitry Potapov wrote:

There are two heuristics in Git to detect whether a file is binary
or text. One in xdiff-interface.c relied on existing NUL byte at
the beginning. However, convert.c used a different heuristic, which
relied that the number of non-printable symbols is less than 1%.

Due to difference in approaches whether a file is binary or not,
it was possible that a file that diff treats as binary will not be
treated as text by CRLF conversation. This is very confusing for
a user who seeing that 'git diff' shows file as binary expects it
to be added as binary.

This patch makes is_binary to consider any file that contains at
least one NUL character as binary.

Shouldn't the commit message explicitly mention that the solution
is to make the check in convert.c stricter than the check in
xdiff-interface.c?  I think a comment in xdiff-interface.c
would also be a good thing to remember future developers about
this.


---

Junio,

I believe that the current behavior where 'git diff' shows me a file
as binary and then adds it as text with crlf conversation is a bug.

Though, it is not very likely to happen, it still possible cases where
a binary file contains large amount of text. For instance, a tar file
of text files can be such a file. Probably, word processor that store
text in binary format may also generate a file with more 99% printable
characters. So, such files will be considered as text by current convert heuristic. Still such files are considered by diff due present of a NUL
character. This is very confusing for a user to see 'git diff' saying
that a file is binary and then having it converted as text. Because I
don't think that any real text file (especially one that requires CRLF
conversation) may contain NUL character, I believe this change should
improve binary heuristic and avoid user confusion.

I think this is a good idea.  When reading the code for the first
time it took me some time to accept that we really have different
ways for detecting a binary file and to understand how these
detections are related.

I also agree with Dimitry that convert.c should be stricter than
xdiff-interface.c, because everything else could be confusing. ...


So, please, consider it for inclusion as a bug fix.

... Hence, this should be considered a bug fix.

The patch looks good to me.

	Steffen




-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux