On Fri, Oct 28, 2016 at 06:07:21PM +0100, James Pharaoh wrote: > On 28/10/16 12:52, Kent Overstreet wrote: > > > That's not what the raid 5 hole is. The raid 5 hole comes from the fact that > > it's not possible to update the p/q blocks atomically with the data blocks, thus > > there is a point in time when they are _inconsistent_ with the rest of the > > stripe, and if used will lead to reconstructing incorrect data. There's no way > > to fix this with just flushes. > > Yes, I understand this, but if the kernel strictly orders writing mdraud > data blocks before parity ones, then it closes part of the hole, especially > if I have a "journal" in a higher layer, and of course ensure that this > journal is reliable. Ordering cannot help you here. Whichever order you do the writes in, there is a point in time where the p/q blocks are inconsistent with the data blocks, thus if you do a reconstruct you will reconstruct incorrect data. Unless you were writing to the entire stripe, this affects data you were _not_ writing to. > > I also think, however, that by putting bcache /under/ mdraid, and (again) > ensuring that the bcache layer is reliable, along with the requirement for > bcache to "journal" all writes, would provide an extremely reliable storage > layer, even at a very large scale. What? No, putting bcache under md wouldn't do anything, it couldn't do anything about the atomicity issue there. Also - Vojtech - btrfs _is_ subject to the raid5 hole, it would have to be doing copygc to not be affceted. -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html