On Sat, May 17, 2008 at 08:48:33PM -0400, Chris Mason wrote: > > Well, the barriers happen like so (even if we actually only do one > barrier in submit_bh, it turns into two) > > write log blocks > flush #1 > write commit block > flush #2 > write metadata blocks > > I'd agree with Ted, there's a fairly small chance of things get reordered > around flush #1. flush #2 is likely to have lots of reordering though. It > should be easy to create situations where the metadata for a transaction is > written before the log blocks ever see the disk. True, but even with a very heavy fsync() workload, a commit doesn't cause the metadata blocks to be written until we have to do a journal truncate operation. A heavy fsync() workload would increase how quickly we would use up the journal and need to do a journal truncate, though. > EMC did a ton of automated testing around this when Jens and I did > the initial barrier implementations, and they were able to trigger > corruptions in fsync heavy workloads with randomized power offs. > I'll dig up the workload they used. I could imagine a mode which forces a barrier operation for commits triggered by fsync()'s, but not commits that occur due to a natural closing of transactions. I'm not sure it's worth it, though, since many of the benchmarks that we care about (like Postmark) do use fsync() fairly heavily. The really annoying thing is that what is really needed is a way to make write barriers cheaper; we don't need to do a synchronous flush, but unfortunately for most drives there isn't any other way of keeping disk writes from getting reordered. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html