Re: Git EOL Normalization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2011-05-26 at 12:28 -0400, Stephen Bash wrote:
> ----- Original Message -----
> > From: "Junio C Hamano" <gitster@xxxxxxxxx>
> > To: "Jakub Narebski" <jnareb@xxxxxxxxx>
> > Sent: Thursday, May 26, 2011 12:07:21 PM
> > Subject: Re: Git EOL Normalization
> > 
> > > I think git examines only first block of a file or so. The heuristic
> > > to detect binary-ness of a file is, as I have heard, the same or
> > > similar to the one that GNU diff uses.
> > 
> > Yes, the binary detection was designed to be compatible with GNU diff. But
> > I do not think it has much to do with the topic of this thread. Aren't
> > other people discussing the line ending?
> 
> The binary detection may be apropos because there are situations
> (core.autocrlf={true,input} and text=auto) where Git will only do line
> ending conversion if it detects a text file...  But I'll leave it to
> people who know the code better to say if this binary detection is in
> fact part of the decision process.

Currently UTF-16 and UTF-32 (which many consider to be text files) are
detected as binary files by Git (due to said compatibility with GNU
diff). Therefore EOL normalization does not happen on those files. 

I have played a little with detecting (and eventually do the same for
normalizing) reasonably valid UTF-16 (BE and LE), but my code is nowhere
near ready for the big time, much less properly tested.

As for diff-ing UTF-16/UTF-32 for purely human consumption, I would be
tempted to iconv (smudge?) the text into UTF-8 and then let the diff-ing
algorithm deal with it. Not a perfect solution, but perfect should not
be the enemy of good in that case. Unfortunately this would not produce
proper patches for mailing. (As for how we'd know it is UTF-32 and not a
binary, I'll leave that for further discussion should we need it. I
suspect we'd have to trust the user. UGH.)

-- 
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]