Re: mingw, windows, crlf/lf, and git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 14 Feb 2007, Johannes Schindelin wrote:
> 
> This sounds regretfully complex. Somebody (you?) mentioned that cvsnt does 
> a kick-ass job here. Does cvsnt need strategies? I don't think so. Neither 
> do we. Someone who cares enough should just rip^H^H^Hlook at cvsnt's text 
> detection.

Well, one thing to keep in mind is that for source code in particular, 
this really very seldom is an issue.

So you can do a really *bad* job in theory, and in practice it really 
works very very well.

Very few people keep binary blobs in any SCM archive _anyway_, partly 
because they've always been told that it's unsafe (and with a lot of SCM's 
it is), but even more because binary blobs are almost always generated by 
some build method, so normally you'd never version them in the first 
place, or versioning isn't all that helpful.

And most binary blobs are so *obviously* binary that even the stupidest 
algorithm on earth will get it right. The only hard cases actually tend to 
be really tiny files, or literally test-sequences.

Tiny files are hard because:

 - they (by being tiny) have so few characters that they can easily lack 
   a "fingerprint" character (eg a NUL character or similar). 

 - tiny files are a lot more likely than bigger files to have strange 
   statistics that throw some more "sophisticated" rule off the scent. 
   Something like a "10% rule" tends to work fine if you have a big text, 
   and ten percent is still a reasonable number to average things out 
   over, but what if you only had ten characters to begin with?

The good news is that tiny files can usually be considered text, since 
you'd seldom use a binary format for something really small anyway.

So I suspect that IN PRACTICE, especially if you come as a CVS replacement 
(where binary files are just damn hard to get right even under the best of 
circumstances!), you can do just about anything, including just saying 
"everything is text", and you'd be fine.

It's entirely possible that that is exactly what CVSNT does ;)

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]