On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark buffer_heads that contain metadata with a flag, so that the > > > flusher thread can write them back at the same priority as reads. > > > > Ext4 is already using REQ_META for this purpose. > > We're using REQ_META | REQ_PRIO for reads, not writes. > > > I'm surprised that no-one has suggested "change the IO elevator" > > yet..... > > Well, testing to see if the stalls go away with the noop schedule is a > good thing to try just to validate the theory. Exactly. > The thing is, we do want to make ext4 work well with cfq, and > prioritizing non-readahead read requests ahead of data writeback does > make sense. The issue is with is that metadata writes going through > the block device could in some cases effectively cause a priority > inversion when what had previously been an asynchronous writeback > starts blocking a foreground, user-visible process. Here's the historic problem with CFQ: it's scheduling algorithms change from release to release, and so what you tune the filesystem to for this release is likely to cause different behaviour in a few releases time. We've had this problem time and time again with CFQ+XFS, so we stopped trying to "tune" to a particular elevator long ago. The best you can do it tag the Io as appropriately as possible (e.g. metadata with REQ_META, sync IO with ?_SYNC, etc), and then hope CFQ hasn't been broken since the last release.... > At least, that's the theory; we should confirm that this is indeed > what is causing the data stalls which Mel is reporting on HDD's before > we start figuring out how to fix this problem. *nod*. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html