On Sun, 4 Jan 2009 17:43:51 -0500 Theodore Tso <tytso@xxxxxxx> wrote: > On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote: > > > Following up with an e-mail thread started by Arjan two months ago, > > > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have > > > a patch, just sent to linux-ext4@xxxxxxxxxxxxxxx, which fixes the jbd2 > > > layer to submit journal writes via submit_bh() with WRITE_SYNC. > > > Hopefully this might be enough of a priority boost so we don't have to > > > force a higher I/O priority level via a buffer_head flag. However, > > > while looking through the code paths, in ordered data mode, we end up > > > flushing data pages via the page writeback paths on a per-inode basis, > > > and I noticed that even though we are passing in > > > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using > > > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC). > > > > But this is all the wrong way to fix the problem, isn't it? > > > > The problem is that at one particular point, the current transaction > > blocks callers behind the committing transaction's IO completion. > > > > Did anyone look at fixing that? ISTR concluding that a data copy and > > shadow-bh arrangement might be needed. > > I haven't had time to really drill down into the jbd code yet, and > yes, eventually we probably want to do this. We do. > Still, if we are > submitting I/O which we are going to end up waiting on, we really > should submit it with WRITE_SYNC, and this patch should optimize > writes in other situations; for example, if we fsync() a file, we will > also end up calling block_write_full_page(), and so supplying the > WRITE_SYNC hint to the block layer would be a Good Thing. Is it? WRITE_SYNC means "unplug the queue after this bh/BIO". By setting it against every bh, don't we risk the generation of more BIOs and the loss of merging opportunities? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html