On Fri, Oct 28, 2016 at 03:52:49AM -0800, Kent Overstreet wrote: > On Thu, Oct 27, 2016 at 12:31:58AM +0200, Vojtech Pavlik wrote: > > In case you're using mdraid for the RAID part on a reasonably recent > > Linux kernel, there is no write hole. Linux mdraid implements barriers > > properly even on RAID5, at the cost of performance - mdraid waits for a > > barrier to complete on all drives before submitting more i/o. > > That's not what the raid 5 hole is. The raid 5 hole comes from the fact that > it's not possible to update the p/q blocks atomically with the data blocks, thus > there is a point in time when they are _inconsistent_ with the rest of the > stripe, and if used will lead to reconstructing incorrect data. There's no way > to fix this with just flushes. Indeed. However, together with the write intent bitmap, and filesystems ensuring consistency through barriers, it's still greatly mitigated. Mdraid will mark areas of disk dirty in the write intent bitmap before writing to them. When the system comes up after a power outage, all areas marked dirty are scanned and the xor block written where it doesn't match the rest. Thanks to the strict ordering using barriers, the damage to the consistency of the RAID can only be in request since the last successfully written barrier. As such, the filesystem will always see a consistent state, and the raid will also always recover to a consistent state. The only situation where data damage can happen is a power outage that comes together with a loss of one of the drives. In such a case, the content of any blocks written past the last barrier is undefined. It then depends on the filesystem whether it can revert to the last sane state. Not sure about others, but btrfs will do so. -- Vojtech Pavlik Director SuSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html