On Thu 08-09-11 09:22:36, Wu Fengguang wrote: > Jan, > > > @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > > if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) { > > int err = write_inode(inode, wbc); > > + if (!err) > > + inode_written = true; > > if (ret == 0) > > ret = err; > > } > > write_inode() typically return error after redirtying the inode. > So the conditions inode_written=false and (inode->i_state & I_DIRTY) > are mostly on/off together. For the cases they disagree, it's probably > a filesystem bug -- at least I don't think some FS will deliberately > return success while redirtying the inode, or the reverse. There is a possibility someone else redirties the inode between the moment I_DIRTY bits are cleared in writeback_single_inode() and the check for I_DIRTY is done after ->write_inode() is called. Especially when write_inode() blocks waiting for some IO this isn't that hard to happen. So there are valid (although relatively rare) cases when inode_written is different from the result of I_DIRTY check. > > } else if (inode->i_state & I_DIRTY) { > > /* > > * Filesystems can dirty the inode during writeback > > * operations, such as delayed allocation during > > * submission or metadata updates after data IO > > - * completion. > > + * completion. Also inode could have been dirtied by > > + * some process aggressively touching metadata. > > + * Finally, filesystem could just fail to write the > > + * inode for some reason. We have to distinguish the > > + * last case from the previous ones - in the last case > > + * we want to give the inode quick retry, in the > > + * other cases we want to put it back to the dirty list > > + * to avoid livelocking of writeback. > > */ > > - redirty_tail(inode, wb); > > + if (inode_written) > > + redirty_tail(inode, wb); > > Can you elaborate the livelock in the below inode_written=true case? > Why the sleep in the wb_writeback() loop is not enough? In case someone would be able to consistently trigger the race window and redirty the inode before we check here, we would loop for a long time always writing just this inode and thus effectivelly stalling other writeback. That's why I push redirtied inode behind other inodes in the dirty list. Honza > > > + else > > + requeue_io(inode, wb); > > } else { > > /* > > * The inode is clean. At this point we either have > > Thanks, > Fengguang -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html