On Tue, 17 Apr 2007, Matthieu Moy wrote: > > * Perhaps your boss will be interested in the "data integrity" (i.e. > git fsck) problem too. The data integrity thing is a lot more than just fsck. I care a lot about my data, and it's an area where a *lot* of systems fall down. CVS is just about the worst (basically no checksums or sanity checking anywhere), and you can pretty much have total data corruption without ever even _realizing_, until you try to get some old version. Even more interesting with CVS is that you can have total data corruption and you'll not realize it *even*as* you use the data. Lots of people and projects have been known to happily move *,v files around and edit the CVS repo files by hand to make things look right, which means that not only did you do a "rename" in CVS, you actually renamed *retroactively* too - you made history look wrong! So with CVS, you actually have no guarantees what-so-ever that when you check out something old, you'll get what you actually used to have. You can tag things as much as you want - if people end up editing the CVS files (and people *do* that), you'll never have any indication that the history you checked out isn't the "real" history. So you can check out some old version that you made a release to a customer off, and may be totally unable to recreate the customer problem, because the release you checked out doesn't even compile any more! You can actually do the same with most other SCM's. It may need somebody who is actually malicious, but even that isn't necessarily the case. Lots of SCM's don't have any checksums *at*all* on their data - the only way you'd ever know that something bad happened and you had disk corruption, is when you check something out and it just looks corrupted! In other words, in a lot of SCM's, you're actually *lucky* if the corruption is so serious that it's not just a subtle "data is wrong" thing, it's so pervasive that you actually get an error from the SCM. In git, every *single* piece of data is not just checksummed, it's CHECKSUMMED. Yeah, we use CRC's and Adler32 for some things, but even those are actually *also* protected at a higher level by real cryptographic hashes. You simply *cannot* corrupt data by mistake and not know about it. You can lose it, you can corrupt it, but it *will* be noticed. If that doesn't make you feel good about your data, I don't know what will. Git will not replace backups in any way, shape, or form (although you can obviously use git itself to _do_ those backups - the joy of distributed SCMS), but it will tell you when you *need* those backups. Guaranteed. And I can tell you that that is actually very rare. I doubt *any* commercial SCM will come even close. They might have checksums, but nothing really strong. It might be a CRC or even weaker. Or it might be nothing at all (and sadly, that's the *common* case). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html