On Sat 02-02-08 00:26:00, Al Boldi wrote: > Chris Mason wrote: > > On Thursday 31 January 2008, Jan Kara wrote: > > > On Thu 31-01-08 11:56:01, Chris Mason wrote: > > > > On Thursday 31 January 2008, Al Boldi wrote: > > > > > The big difference between ordered and writeback is that once the > > > > > slowdown starts, ordered goes into ~100% iowait, whereas writeback > > > > > continues 100% user. > > > > > > > > Does data=ordered write buffers in the order they were dirtied? This > > > > might explain the extreme problems in transactional workloads. > > > > > > Well, it does but we submit them to block layer all at once so > > > elevator should sort the requests for us... > > > > nr_requests is fairly small, so a long stream of random requests should > > still end up being random IO. > > > > Al, could you please compare the write throughput from vmstat for the > > data=ordered vs data=writeback runs? I would guess the data=ordered one > > has a lower overall write throughput. > > That's what I would have guessed, but it's actually going up 4x fold for > mysql from 559mb to 2135mb, while the db-size ends up at 549mb. So you say we write 4-times as much data in ordered mode as in writeback mode. Hmm, probably possible because we force all the dirty data to disk when committing a transation in ordered mode (and don't do this in writeback mode). So if the workload repeatedly dirties the whole DB, we are going to write the whole DB several times in ordered mode but in writeback mode we just keep the data in memory all the time. But this is what you ask for if you mount in ordered mode so I wouldn't consider it a bug. I still don't like your hack with per-process journal mode setting but we could easily do per-file journal mode setting (we already have a flag to do data journaling for a file) and that would help at least your DB workload... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html