On Sat, 18 Jul 2009, Dan Williams wrote:
On Sat, Jul 18, 2009 at 4:53 AM, David Woodhouse<dwmw2@xxxxxxxxxxxxx> wrote:
On Fri, 2009-07-17 at 11:49 -0400, H. Peter Anvin wrote:
Cost, yes, of changing an on-disk format.
Personally, I don't care about that -- I'm utterly uninterested in the
legacy RAID-6 setup where it pretends to be a normal disk. I think that
model is as fundamentally wrong as flash devices making the similar
pretence.
I can understand the frustration of these details being irretrievably
hidden behind a proprietary interface out of the filesystem's control.
However, this is not the case with Linux software RAID. I suspect
that there is room for more interaction with even "legacy" filesystems
to communicate things like: "don't worry about initializing that
region of the disk it's all free space", "don't bother resyncing on
dirty shutdown, if power-loss interrupts a write I guarantee I will
replay the entire stripe to you at a later date", or "hey, that last
block I read doesn't checksum, can you come up with a different
version?"
I was under the impression that btrfs wanted to leverage md's stripe
handling logic as well, seems that is not the case?
No. We do a bunch of the stuff you mention above, but entirely within the
file system so we don't have to invent a bunch of layering violations just
to work around a broken design.
¹ Well, kind of. The xor_blocks() function will silently screw you over
if you ask it to handle more than 5 blocks at a time.
async_xor() handles arbitrary block counts.
That's useful to know; thanks.
--
dwmw2