On Wed, 14 Feb 2007, Johannes Schindelin wrote: > > This sounds regretfully complex. Somebody (you?) mentioned that cvsnt does > a kick-ass job here. Does cvsnt need strategies? I don't think so. Neither > do we. Someone who cares enough should just rip^H^H^Hlook at cvsnt's text > detection. Well, one thing to keep in mind is that for source code in particular, this really very seldom is an issue. So you can do a really *bad* job in theory, and in practice it really works very very well. Very few people keep binary blobs in any SCM archive _anyway_, partly because they've always been told that it's unsafe (and with a lot of SCM's it is), but even more because binary blobs are almost always generated by some build method, so normally you'd never version them in the first place, or versioning isn't all that helpful. And most binary blobs are so *obviously* binary that even the stupidest algorithm on earth will get it right. The only hard cases actually tend to be really tiny files, or literally test-sequences. Tiny files are hard because: - they (by being tiny) have so few characters that they can easily lack a "fingerprint" character (eg a NUL character or similar). - tiny files are a lot more likely than bigger files to have strange statistics that throw some more "sophisticated" rule off the scent. Something like a "10% rule" tends to work fine if you have a big text, and ten percent is still a reasonable number to average things out over, but what if you only had ten characters to begin with? The good news is that tiny files can usually be considered text, since you'd seldom use a binary format for something really small anyway. So I suspect that IN PRACTICE, especially if you come as a CVS replacement (where binary files are just damn hard to get right even under the best of circumstances!), you can do just about anything, including just saying "everything is text", and you'd be fine. It's entirely possible that that is exactly what CVSNT does ;) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html