On Thu, 2011-05-26 at 12:28 -0400, Stephen Bash wrote: > ----- Original Message ----- > > From: "Junio C Hamano" <gitster@xxxxxxxxx> > > To: "Jakub Narebski" <jnareb@xxxxxxxxx> > > Sent: Thursday, May 26, 2011 12:07:21 PM > > Subject: Re: Git EOL Normalization > > > > > I think git examines only first block of a file or so. The heuristic > > > to detect binary-ness of a file is, as I have heard, the same or > > > similar to the one that GNU diff uses. > > > > Yes, the binary detection was designed to be compatible with GNU diff. But > > I do not think it has much to do with the topic of this thread. Aren't > > other people discussing the line ending? > > The binary detection may be apropos because there are situations > (core.autocrlf={true,input} and text=auto) where Git will only do line > ending conversion if it detects a text file... But I'll leave it to > people who know the code better to say if this binary detection is in > fact part of the decision process. Currently UTF-16 and UTF-32 (which many consider to be text files) are detected as binary files by Git (due to said compatibility with GNU diff). Therefore EOL normalization does not happen on those files. I have played a little with detecting (and eventually do the same for normalizing) reasonably valid UTF-16 (BE and LE), but my code is nowhere near ready for the big time, much less properly tested. As for diff-ing UTF-16/UTF-32 for purely human consumption, I would be tempted to iconv (smudge?) the text into UTF-8 and then let the diff-ing algorithm deal with it. Not a perfect solution, but perfect should not be the enemy of good in that case. Unfortunately this would not produce proper patches for mailing. (As for how we'd know it is UTF-32 and not a binary, I'll leave that for further discussion should we need it. I suspect we'd have to trust the user. UGH.) -- -Drew Northup ________________________________________________ "As opposed to vegetable or mineral error?" -John Pescatore, SANS NewsBites Vol. 12 Num. 59 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html