On 04/01/2009 12:31, John Robinson wrote:
On 04/01/2009 07:37, Martin K. Petersen wrote:
[...]
We also don't want to do checksumming at every layer. That's going to
suck from a performance perspective. It's better to do checksumming
high up in the stack and only do it once. As long as we give the upper
layers the option of re-driving the I/O.
That involves adding a cookie to each bio that gets filled out by DM/MD
on completion. If the filesystem checksum fails we can resubmit the I/O
and pass along the cookie indicating that we want a different copy than
the one the cookie represents.
I'd like to understand this mechanism better; at first glance it's
either going to be too simplistic and not cover the various block layer
cases well, or it means you end up re-implementing RAID and LVM in the
filesystem.
I've thought about this again, and I'm wrong; there may be complications
in handling the cookies up and down the stack where more than one layer
thinks it knows how to have another go, but I can see what you describe
as being useful and relatively device-agnostic.
I wonder if there might also be scope for cookies going down through the
stack to carry an indication of how hard to try; some filesystems or
other consumers of block devices may be willing to ask again or want to
be told about problems quickly (e.g. btrfs over RAID over TLER-equipped
discs), while some may need best efforts all out first time because they
can't cope will failure returns (e.g. FAT over cheap IDE discs).
Anyway, I think I'd better leave all this to the experts i.e. you :-)
Cheers,
John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html