On Sun, 02 Jan 2011 21:06:52 -0700 "Patrick H." <linux-raid@xxxxxxxxxxxx> wrote: > That makes sense assuming that MD acknowleges the write once the data is > written to the data disks but not necessarily the parity disk, which is > what I gather you were saying is what happens. Is there any option that > can change the behavior so that md wont ack the write until its been > committed to all disks (I'm guessing no since you didnt mention it)? > Also does raid6 suffer this problem? Is it smart enough to use both > parity disks when calculating replacement, or will it just use one? > md/raid5 doesn't acknowledge the write until both the data and the parity have been written. But that doesn't make any difference. If you schedule a number of interdependent writes (data and parity) and then allow some to complete but not all, then you have inconsistency. Recovery from losing a single device requires consistency of parity and data. RAID6 suffers equally from this problem. Even if it used both parity disks to recover (which it doesn't) how would that help? It would then have two possible value for the data and no way to know which was correct, and every possibility that both are incorrect. This would happen if a single data block was successfully written, but neither parity blocks were. The only way you can avoid this 'write hole' is by journalling in multiples of whole stripes. No current filesystems that I know of can do this as they journal in blocks, and the maximum block size is less than the minimum stripe size. So you would need journalling integrated with md/raid, or you would need a filesystem which was designed to understand this problem and write whole stripes at a time, always to an area of the device which did not contain live data. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html