Calling redirty_tail() can put off inode writeback for upto 30 seconds (or whatever dirty_expire_centisecs is). This is unnecessarily big delay in some cases and in other cases it is a really bad thing. In particular XFS tries to be nice to writeback and when ->write_inode is called for an inode with locked ilock, it just redirties the inode and returns EAGAIN. That currently causes writeback_single_inode() to redirty_tail() the inode. As contended ilock is common thing with XFS while extending files the result can be that inode writeout is put off for a really long time. Now that we have more robust busyloop prevention in wb_writeback() we can call requeue_io() in cases where quick retry is required without fear of raising CPU consumption too much. CC: Christoph Hellwig <hch@xxxxxxxxxxxxx> Acked-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> Signed-off-by: Jan Kara <jack@xxxxxxx> --- fs/fs-writeback.c | 56 +++++++++++++++++++++++++++------------------------- 1 files changed, 29 insertions(+), 27 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index b619f3a..094afcd 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -356,6 +356,7 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, long nr_to_write = wbc->nr_to_write; unsigned dirty; int ret; + bool inode_written = false; assert_spin_locked(&wb->list_lock); assert_spin_locked(&inode->i_lock); @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, /* Don't write the inode if only I_DIRTY_PAGES was set */ if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) { int err = write_inode(inode, wbc); + if (!err) + inode_written = true; if (ret == 0) ret = err; } @@ -430,17 +433,20 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, if (!(inode->i_state & I_FREEING)) { /* * Sync livelock prevention. Each inode is tagged and synced in - * one shot. If still dirty, it will be redirty_tail()'ed below. - * Update the dirty time to prevent enqueue and sync it again. + * one shot. If still dirty, update dirty time and put it back + * to dirty list to prevent enqueue and syncing it again. */ if ((inode->i_state & I_DIRTY) && - (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) + (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) { inode->dirtied_when = jiffies; - - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { + redirty_tail(inode, wb); + } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { /* - * We didn't write back all the pages. nfs_writepages() - * sometimes bales out without doing anything. + * We didn't write back all the pages. We may have just + * run out of our writeback slice, or nfs_writepages() + * sometimes bales out without doing anything, or e.g. + * btrfs ignores for_kupdate writeback requests for + * metadata inodes. */ inode->i_state |= I_DIRTY_PAGES; if (wbc->nr_to_write <= 0) { @@ -450,11 +456,9 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, requeue_io(inode, wb); } else { /* - * Writeback blocked by something other than - * congestion. Delay the inode for some time to - * avoid spinning on the CPU (100% iowait) - * retrying writeback of the dirty page/inode - * that cannot be performed immediately. + * Writeback blocked by something. Put inode + * back to dirty list to prevent livelocking of + * writeback. */ redirty_tail(inode, wb); } @@ -463,9 +467,19 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, * Filesystems can dirty the inode during writeback * operations, such as delayed allocation during * submission or metadata updates after data IO - * completion. + * completion. Also inode could have been dirtied by + * some process aggressively touching metadata. + * Finally, filesystem could just fail to write the + * inode for some reason. We have to distinguish the + * last case from the previous ones - in the last case + * we want to give the inode quick retry, in the + * other cases we want to put it back to the dirty list + * to avoid livelocking of writeback. */ - redirty_tail(inode, wb); + if (inode_written) + redirty_tail(inode, wb); + else + requeue_io(inode, wb); } else { /* * The inode is clean. At this point we either have @@ -581,13 +595,6 @@ static long writeback_sb_inodes(struct super_block *sb, wrote += write_chunk - wbc.nr_to_write; if (!(inode->i_state & I_DIRTY)) wrote++; - if (wbc.pages_skipped) { - /* - * writeback is not making progress due to locked - * buffers. Skip this inode for now. - */ - redirty_tail(inode, wb); - } spin_unlock(&inode->i_lock); spin_unlock(&wb->list_lock); iput(inode); @@ -618,12 +625,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb, struct super_block *sb = inode->i_sb; if (!grab_super_passive(sb)) { - /* - * grab_super_passive() may fail consistently due to - * s_umount being grabbed by someone else. Don't use - * requeue_io() to avoid busy retrying the inode/sb. - */ - redirty_tail(inode, wb); + requeue_io(inode, wb); continue; } wrote += writeback_sb_inodes(sb, wb, work); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html