Re: Extra write mode to close RAID5 write hole (kind of)

James Pharaoh <james@xxxxxxxxxx> · Fri, 28 Oct 2016 18:07:21 +0100

On 28/10/16 12:52, Kent Overstreet wrote:

That's not what the raid 5 hole is. The raid 5 hole comes from the fact that
it's not possible to update the p/q blocks atomically with the data blocks, thus
there is a point in time when they are _inconsistent_ with the rest of the
stripe, and if used will lead to reconstructing incorrect data. There's no way
to fix this with just flushes.

Yes, I understand this, but if the kernel strictly orders writing mdraud 
data blocks before parity ones, then it closes part of the hole, 
especially if I have a "journal" in a higher layer, and of course ensure 
that this journal is reliable.

I think that, in the case of a drive failure, which contains data blocks 
which have been written, but which the parity blocks have not been, then 
this will fail.

I also think, however, that by putting bcache /under/ mdraid, and 
(again) ensuring that the bcache layer is reliable, along with the 
requirement for bcache to "journal" all writes, would provide an 
extremely reliable storage layer, even at a very large scale.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html