On Sat, Sep 19, 2009 at 11:10 AM, Mario 'BitKoenig' Holbe <Mario.Holbe@xxxxxxxxxxxxx> wrote: > Bryan Mesich <bryan.mesich@xxxxxxxx> wrote: >> On Wed, Sep 16, 2009 at 09:20:35PM +0200, Mario 'BitKoenig' Holbe wrote: >>> They should not appear on RAID5. >> I would agree. The only reason I mentioned RAID5 was to remove the >> possibility that the HD were spontaneously flipping bits. Our SAN > > This should not happen on recent disks (and even not that recent ones) > either. Disks do error correction and don't deliver faulty data, they > deliver read errors instead. Maybe there are some disks where you can > disable ECC (I'm not aware of such), but I doubt you would get even one > reasonable bit out of them then :) > > And *if* spontaneously flipping bits *would* happen on single disks, > they would also happen on your RAID5. No RAID level except RAID2 (which > does ECC on its own) tolerates this kind of error, they all rely on > disks delivering either correct data or error messages. > There is a whole series of places a bit flip can occur after the data is read from the platter and the ECC verified that don't generate error messages. Could be in the IDE electronics themselves on the drive. In the IDE or SATA cable. (I think sata has a checksum on the transmission. IDE cables don't.) In the controller. In ram. In the cache. In the CPU, etc., etc. etc. If you want reliable data you have to build in end-to-end verification. As long as you attack the issues piece by little piece, you are going to have weaknesses where a bit flip can sneak in. That is one reason we see MD5s distributed with lots of downloadable ISOs, etc. In theory the whole distribution process is reliable, but by verifying it at the very end you gain a significant amount of confidence. With regards to data storage, one major step in this direction is the "integrity" patch that went into the kernel last winter (2.6.28?). There is apparently now a scsi standard that allows a checksum / crc to be passed along with the data. The protocol for calculating the value is published, so at the top of the linux block stack, with this feature enabled a chucksum / crc is calculated as soon as a filesystem puts a block of data into the block queues. The checksum / crc travels with the data all the way to the scsi subsystem. The subsystem in turn verifies the value and errors out on a data write if the data and checksum / crc are in disagreement. On read, the subsystem also provides the checksum / crc in addition to the data. This data traverses the linux block stack all the way to the filesystem and is verified immediately prior to being handed off to the filesystem. This is all pretty new obviously. To the best of my knowledge filesystems have not yet been enhanced to track this value, thus covering even more of the end-to-end transaction. I don't know how specifically, but it also seems to me the mdraid stack could add to currently poor data integrity process even in the absence of a supporting scsi subsystem. Maybe by pulling out the integrity checksum / crc info and putting it on yet another disk, or mixing it in with the parity calculation. Specifically you could steal the second parity stripe from a raid 6 setup and replace it with this end-to-end data integrity checksum / crc. The checksum / crc is much smaller than the original data so the one integrity disk should support a reasonable number of data disks. Obviously this would not be one of the formal raid levels, but that doesn't mean its not useful. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html