On Friday November 18, mhardy@xxxxxxx wrote: > > So, I continue to believe silent corruption is mythical. I'm still open > to good explanation it's not though. > Silent corruption is not mythical, though it is probably talked about more than it actually happens (but then as it is silent, I cannot be certain :-). Silent corruption can happen only if an unclean degraded array is started. md will not start an unclean degraded (raid 4/5/6) array (though I'm going to add a module parameter to allow it) and mdadm will only start such an array if given --force (in which case it modifies to appear clean so md will start it). If your array is not degraded, or you always shut down cleanly, there is no opportunity for raid5-level corruption (of course, the drives may choose to corrupt things silently themselves...). Note that an unclean degraded start doesn't imply corruption - you could be in this situation and not have any corruption at all. But it does allow it. It must as 'unclean' means you cannot trust the parity, and 'degraded' means that you have to. There are two solutions to this silent corruption problem (other than 'ignore it and hope it doesn't bite' which is a fair widely used solution, and I haven't seen any bite marks myself). One is journalling, as has been mentioned. This could be done to a mirrored pair, or to a ECC NVRAM card (the latter being probably the best, though also most expensive). You would write each data block as it becomes available, and each parity block just before commencing a write to the raid5. Obviously you also keep track of what you have written. I have toyed with the idea of implementing this, but I think demand is sufficiently low that it isn't worth it. The other is to use a filesystem that allows the problem to be avoided by making sure that the only blocks that can be corrupted are dead blocks. This could be done with a copy-on-write filesystem that knows about the raid5 geometry, and only ever writes to a stripe when no other blocks on the stripe contain live data. I've been working on a filesystem which does just this, and hope to have it available in a year or two (it is a back-ground 'hobby' project). I know that ZFS is a copy-on-write filesystem. It is entirely possible that it can do the right thing for raid5. And as an addendum, md/raid5 never reports a block as complete to the filesystem until the device drives have reported the data block and the parity block as being safe. i.e. It has a write-through cache, not a write-behind cache. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html