On Thu, Sep 08, 2011 at 11:03:40PM +0800, Jan Kara wrote: > On Thu 08-09-11 09:22:36, Wu Fengguang wrote: > > Jan, > > > > > @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, > > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > > > if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) { > > > int err = write_inode(inode, wbc); > > > + if (!err) > > > + inode_written = true; > > > if (ret == 0) > > > ret = err; > > > } > > > > write_inode() typically return error after redirtying the inode. > > So the conditions inode_written=false and (inode->i_state & I_DIRTY) > > are mostly on/off together. For the cases they disagree, it's probably > > a filesystem bug -- at least I don't think some FS will deliberately > > return success while redirtying the inode, or the reverse. > There is a possibility someone else redirties the inode between the moment > I_DIRTY bits are cleared in writeback_single_inode() and the check for > I_DIRTY is done after ->write_inode() is called. Especially when > write_inode() blocks waiting for some IO this isn't that hard to happen. So > there are valid (although relatively rare) cases when inode_written is > different from the result of I_DIRTY check. Ah yes, that's good point. > > > } else if (inode->i_state & I_DIRTY) { > > > /* > > > * Filesystems can dirty the inode during writeback > > > * operations, such as delayed allocation during > > > * submission or metadata updates after data IO > > > - * completion. > > > + * completion. Also inode could have been dirtied by > > > + * some process aggressively touching metadata. > > > + * Finally, filesystem could just fail to write the > > > + * inode for some reason. We have to distinguish the > > > + * last case from the previous ones - in the last case > > > + * we want to give the inode quick retry, in the > > > + * other cases we want to put it back to the dirty list > > > + * to avoid livelocking of writeback. > > > */ > > > - redirty_tail(inode, wb); > > > + if (inode_written) > > > + redirty_tail(inode, wb); > > > > Can you elaborate the livelock in the below inode_written=true case? > > Why the sleep in the wb_writeback() loop is not enough? > In case someone would be able to consistently trigger the race window and > redirty the inode before we check here, we would loop for a long time > always writing just this inode and thus effectivelly stalling other > writeback. That's why I push redirtied inode behind other inodes in the > dirty list. Agreed. All the left to do is to confirm whether this addresses Christoph's original problem. Acked-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html