> If the RAID code is changed to handle barriers, that would still have > possible "scattershot" corruption on RAID-5, because writing a single > sector on the logical device affects more than one visible sector if > it is interrupted. In other words, the "radius of corruption" is > bigger than one sector for RAID-5, and it's not contiguous either. I've seen several RAID-5 systems, and they all went to great lengths to ensure that interrupting a write to Sector A can't destroy Sector B. It isn't easy; it involves journalling. But I've always taken it as an absolute requirement. I assume you're talking about something like where Sectors 1-5 are covered by a single parity sector and the RAID system restarts between it has written Sector 1 and when it has written the new parity. Now if you lose Sector 2, you'll recover incorrect contents for it. Linux kernel RAID-5 isn't one of the ones I've looked at; I presume you're saying it does have this problem. > In principle, journalling filesystems need to know the "radius of > corruption" to provide robust journalling. If individual sector > writes are atomic, this isn't an issue. Some people think sector > writes are atomic on modern hard drives (but I wouldn't count on it). > But it is definitely not atomic when writing to a RAID or multipath if > the write affects more than one device. It would make a lot more sense to make the RAID block device driver present a block device that can't corrupt data upon something as simple as a restart in the middle of write to an unrelated sector than to make filesystem drivers comprehend a block device that can. Less work, more integrity. Some have noted recently that block devices are really too simple to do some of the fancy storage things we'd like to do these days anyway, so another approach would be to integrate the RAID-5 function in the filesystem driver instead of attempting to have a RAID block device layer. For now, I'll just try to remember not to use Linux kernel RAID-5. >If individual sector writes are atomic, this isn't an issue. True, however: atomic is sufficient, but not necessary. In the real world, disk drive writes aren't atomic, and it's OK. A journalling filesystem can deal with a failed write wiping out the previous contents of the subject sector. It just can't deal with a failed write polluting some unrelated previously hardened sector. -- Bryan Henderson IBM Almaden Research Center San Jose CA Storage Systems -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html