On Fri, Apr 03, 2009 at 11:24:50AM -0700, Linus Torvalds wrote: > But at the same time, I now suspect that we could actually have solved > this problem more easily by just doing things the other way around: make > the default "WRITE" be the high-priority one (to match "READ"), and then > just explicitly marking the data writes with "WRITE_ASYNC". > > Why? Because I think that with all the writes sprinkled around in random > places, it's probably _easier_ to find the bulk writes that cause the > biggest issues, and just fix _those_ to be WRITE_ASYNC. They may be bulk, > they may be the common case, but they also tend to be the case where we > write with generic routines (eg the whole "do_writepages()" thing). > > So the VFS layer tends to already do much of the bulk writeout, and maybe > we would have been better off just changing those to ASYNC and leaving any > more specialized cases as the SYNC case? That would have avoided a lot of > this effort at the filesystem level. We'd just assume that the default > filesystem-specific writes tend to all be SYNC. Well, most filesystem-specific writes tend all to be ASYNC; it's only those related to commits and fsync() which are SYNC. Ext3 is unusual in that data=ordered and the physical-block journalling design of the jbd layer means that we actually have a much larger number of blocks that need to be written out synchronously than most other filesystems. But even so, the number of callsites that I needed to change weren't that large; in fact, over half of them weren't in the filesystem at all, but in the page writeback code, since fsync() and data=ordered both need to wait for the inodes's pages to be flushed out to disk, and that's all done in common code. The other 40% was in the jbd's commit code, while we are writing out the journal buffers. I suspect the more important thing to address is the fact that WRITE_SYNC unplugs the block device queue, and we would be better off separating marking a particular I/O as "a user is waiting for this" from "unplug the device queue now". That will hopefully improve things even more. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html