On Mon, May 23, 2011 at 09:14:24PM +0800, Jan Kara wrote: > On Fri 20-05-11 05:31:19, Wu Fengguang wrote: > > > > > @@ -419,6 +419,15 @@ writeback_single_inode(struct inode *ino > > > > > spin_lock(&inode->i_lock); > > > > > inode->i_state &= ~I_SYNC; > > > > > if (!(inode->i_state & I_FREEING)) { > > > > > + /* > > > > > + * Sync livelock prevention. Each inode is tagged and synced in > > > > > + * one shot, so we can unconditionally update its dirty time to > > > > > + * prevent syncing it again. Note that time ordering of b_dirty > > > > > + * list will be kept because the following code either removes > > > > > + * the inode from b_dirty or calls redirty_tail(). > > > > > + */ > > > > > + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_sync) > > > > > + inode->dirtied_when = jiffies; > > > > > > > > Shouldn't this update only ocur if the inode is still dirty? > > > > > > Yeah, that would be better even though the current form won't lead to > > > errors. > > > > > > Let's add one more test (inode->i_state & I_DIRTY)? > > > (I was actually aware of the trade offs and didn't bother to add it..) > > > > Oops, the (inode->i_state & I_DIRTY) test is not enough because > > when the inode is actively dirtied, I_DIRTY_PAGES won't be set when > > the flusher is writing out the pages with I_SYNC set... > Well, rather on contrary I_DIRTY_PAGES won't be set when noone dirtied > any page after we cleared I_DIRTY_PAGES. Ah sorry! I misread the logic in __mark_inode_dirty(): when I_SYNC is set, i_state will still be updated when some new pages are dirtied. > > Well it would look clumsy to add another mapping_tagged(mapping, > > PAGECACHE_TAG_DIRTY) test. So I tend to revert to the original scheme > > of updating ->dirtied_when only on newly dirtied pages. It's the > > simplest form that can avoid unnecessarily polluting ->dirtied_when. > Hmm, but won't now something like: > while true; do touch f; done > livelock sync? We always manage to write something - 1 inode - and the > inode will never be clean (before the IO completes, the other process > manages to dirty the inode again with very high probability). You are right. Here is the updated patch that can cover that livelock case. Thanks, Fengguang --- Subject: writeback: update dirtied_when for synced inode to prevent livelock Date: Wed Apr 27 19:05:21 CST 2011 Explicitly update .dirtied_when on synced inodes, so that they are no longer considered for writeback in the next round. We'll do more aggressive "keep writeback as long as we wrote something" logic in wb_writeback(). The "use LONG_MAX .nr_to_write" trick in commit b9543dac5bbc ("writeback: avoid livelocking WB_SYNC_ALL writeback") will no longer be enough to stop sync livelock. It can prevent both of the following livelock schemes: - while true; do echo data >> f; done - while true; do touch f; done Reviewed-by: Jan Kara <jack@xxxxxxx> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- fs/fs-writeback.c | 9 +++++++++ 1 file changed, 9 insertions(+) --- linux-next.orig/fs/fs-writeback.c 2011-05-20 05:35:41.000000000 +0800 +++ linux-next/fs/fs-writeback.c 2011-05-24 10:54:01.000000000 +0800 @@ -419,6 +419,15 @@ writeback_single_inode(struct inode *ino spin_lock(&inode->i_lock); inode->i_state &= ~I_SYNC; if (!(inode->i_state & I_FREEING)) { + /* + * Sync livelock prevention. Each inode is tagged and synced in + * one shot. If still dirty, it will be redirty_tail()'ed below. + * Update the dirty time to prevent enqueue and sync it again. + */ + if ((inode->i_state & I_DIRTY) && + (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) + inode->dirtied_when = jiffies; + if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { /* * We didn't write back all the pages. nfs_writepages() -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html