While I agree with BAARF.com arguments fully, I sometimes have to deal with legacy systems with wide RAID6 sets (for example 16 drives, quite revolting) which have op-journaled filesystems on them like XFS or JFS (sometimes block-journaled ext[34], but I am not that interested in them for this). Sometimes (but fortunately not that recently) I have had to deal with small-file filesystems setup on wide-stripe RAID6 setup by morons who don't understand the difference between a database and a filesystem (and I have strong doubts that RAID6 is remotely appropriate to databases). So I'd like to figure out how much effort I should invest in undoing cases of the above, that is how badly they are likely to be and degrade over time (usually very badly). First a couple of question purely about RAID, but indirectly relevant to op-journaled filesystems: * Can Linux MD do "abbreviated" read-modify-write RAID6 updates like for RAID5? That is where not the whole stripe is read in, modified and written, but just the block to be updated and the parity wblocks. * When reading or writing part of RAID[456] stripe for example smaller than a sector, what is the minimum unit of transfer with Linux MD? The full stripe, the chunk containing the sector, or just the sector containing the bytes to be written or updated (and potentially the parity sectors)? I would expect reads to always read just the sector, but not so sure about writing. * What about popular HW RAID host adapter (e.g. LSI, Adaptec, Areca, 3ware), where is the documentation if any on how they behave in these cases? Regardless, op-journaled file system designs like JFS and XFS write small records (way below a stripe set size, and usually way below a chunk size) to the journal when they queue operations, even if sometimes depending on design and options may "batch" the journal updates (potentially breaking safety semantics). Also they do small write when they dequeue the operations from the journal to the actual metadata records involved. How bad can this be when the journal is say internal for a filesystem that is held on wide-stride RAID6 set? I suspect very very bad, with apocalyptic read-modify-write storms, eating IOPS. I suspect that this happens a lot with SSDs too, where the role of stripe set size is played by the erase block size (often in the hundreds of KBytes, and even more expensive). Where are studies or even just impressions of anedoctes on how bad this is? Are there instrumentation tools in JFS or XFS that may allow me to watch/inspect what is happening with the journal? For Linux MD to see what are the rates of stripe r-m-w cases? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html