> Bryan Henderson wrote: > > > If the RAID code is changed to handle barriers, that would still have > > > possible "scattershot" corruption on RAID-5, because writing a single > > > sector on the logical device affects more than one visible sector if > > > it is interrupted. In other words, the "radius of corruption" is > > > bigger than one sector for RAID-5, and it's not contiguous either. > > > > I've seen several RAID-5 systems, and they all went to great lengths to > > ensure that interrupting a write to Sector A can't destroy Sector B. It > > isn't easy; it involves journalling. But I've always taken it as an > > absolute requirement. > > How do you do a second layer of journalling (in addition to the > filesystem's) without a big performance penalty for the extra seeks? The systems I know all have a means of storing data persistent across the kinds of restarts in question without seeking. It's probably the only way to get great performance with data integrity. But some things about Linux block device RAID-5 are coming back to me. In the early implementations, if the system restarted without explicitly shutting down the array (as in a power failure), all of the parity in the array would be rebuilt. Later, a "write intent bitmap" was added so it could rebuild substantially less than all of the parity. That bitmap is the journal I was talking about, and I don't know what if anything it does to avoid a big performance penalty. > But an failed write might corrupt previously > hardened sectors in these cases: > > - Disks with 4k sectors pretending to be 512 byte sectors. AFAIK there are no such disks today and there is a big controversy over whether it's acceptable for such disks currently being designed to allow such corruption. > - RAIDs without journalling (or other equivalent) and no > battery backup. I still don't know if anybody is doing that. > - SSDs and other flash storage if their internal algorithms are stupid. I don't know if that's commonly accepted either. -- Bryan Henderson IBM Almaden Research Center San Jose CA Storage Systems -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html