On Wednesday November 30, tedenehy@xxxxxxxxxxx wrote: > > I will be presenting a paper at the upcoming USENIX FAST conference > about using the ext3 file system journal to guide the software RAID > resynchronization process. I was wondering if you had any opinions > as to the viability of this approach in Linux. > > Here is a link to the paper: > > http://www.cs.wisc.edu/adsl/Publications/fast05-journal-guided.pdf > Firstly, a couple of minor points: 'multimillion dollar price tag' -- maybe a little bit of an exaggeration. bitmap intent logging for raid1 is now in the mainline kernel (2.6.14) and will be for raid4/5/6/10 in 2.6.16. (Your paper says it isn't yet, but gives no date or release to give context to your statement). It would be really great if you could do your same tests with bitmap-based intent logging and see how much is slows writes down. I suspect it would be more than with your declared mode, but definite figures would be great. (Your comparison on code size is certainly interesting!). I agree that closer communication between the filesystem and the storage system is import to improve raid performance and reliability. Your 'verified read' sounds like a very appropriate part of that. There is an awkwardness when a raid array is partitioned as then you might want the raid system to resync some parts, but leave the filesystem to resync other parts (And a partition used for swap would never need syncing at all). Maybe some very course variety of the bitmap intent log might be useful here (ext3 with declared mode would tell raid never to set intent bits..). Adding full journalling to md is something I have considered from time to time. It would need NVRAM to be accepted, and at a couple of thousand for such a board, it isn't common enough for me to justify the effort.... What I would really like is a cheap (Well, not too expensive) board that had at least 100Meg of NVRAM which was addressable on the PCI buss, and an XOR and RAID-6 engine connected to the DMA engine. Then we could use the NVRAM as a write-behind cache and offload all the parity calculation to it, while still having all the flexibility of software raid... I'd probably be happy to consider the 'verified read' enhancements to md for inclusion in mainline. Good work! NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html