On Thu 28-07-22 15:39:14, Lukas Czerner wrote: > Currently the I_DIRTY_TIME will never get set if the inode already has > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's > true, however ext4 will only update the on-disk inode in > ->dirty_inode(), not on actual writeback. As a result if the inode > already has I_DIRTY_INODE state by the time we get to > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled > into on-disk inode and will not get updated until the next I_DIRTY_INODE > update, which might never come if we crash or get a power failure. > > The problem can be reproduced on ext4 by running xfstest generic/622 > with -o iversion mount option. Fix it by setting I_DIRTY_TIME even if > the inode already has I_DIRTY_INODE. As a datapoint I've checked and XFS has the very same problem as ext4. > Also clear the I_DIRTY_TIME after ->dirty_inode() otherwise it may never > get cleared. > > Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx> > --- > fs/fs-writeback.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 05221366a16d..174f01e6b912 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -2383,6 +2383,11 @@ void __mark_inode_dirty(struct inode *inode, int flags) > > /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */ > flags &= ~I_DIRTY_TIME; > + if (inode->i_state & I_DIRTY_TIME) { > + spin_lock(&inode->i_lock); > + inode->i_state &= ~I_DIRTY_TIME; > + spin_unlock(&inode->i_lock); > + } Hum, so this is a bit dangerous because inode->i_state may be inconsistent with the writeback list inode is queued in (wb->b_dirty_time) and these two are supposed to be in sync. So I rather think we need to make sure we go through the full round of 'update flags and writeback list' below in case we need to clear I_DIRTY_TIME from inode->i_state. > } else { > /* > * Else it's either I_DIRTY_PAGES, I_DIRTY_TIME, or nothing. > @@ -2399,13 +2404,20 @@ void __mark_inode_dirty(struct inode *inode, int flags) > */ > smp_mb(); > > - if (((inode->i_state & flags) == flags) || > - (dirtytime && (inode->i_state & I_DIRTY_INODE))) > + if ((inode->i_state & flags) == flags) > return; > > spin_lock(&inode->i_lock); > - if (dirtytime && (inode->i_state & I_DIRTY_INODE)) > + if (dirtytime && (inode->i_state & I_DIRTY_INODE)) { > + /* > + * We've got a new lazytime update. Make sure it's recorded in > + * i_state, because the time might have already got updated in > + * ->dirty_inode() and will not get updated until next > + * I_DIRTY_INODE update. > + */ > + inode->i_state |= I_DIRTY_TIME; > goto out_unlock_inode; > + } So I'm afraid this combination is not properly handled in writeback_single_inode() where we have at the end: if (!(inode->i_state & I_DIRTY_ALL)) inode_cgwb_move_to_attached(inode, wb); else if (!(inode->i_state & I_SYNC_QUEUED) && (inode->i_state & I_DIRTY)) redirty_tail_locked(inode, wb); So inode that had I_DIRTY_SYNC | I_DIRTY_TIME will not be properly refiled to wb->b_dirty_time list after writeback was done and I_DIRTY_SYNC got cleared. So we need to refine it to something like: if (!(inode->i_state & I_DIRTY_ALL)) inode_cgwb_move_to_attached(inode, wb); else if (!(inode->i_state & I_SYNC_QUEUED)) { if (inode->i_state & I_DIRTY) { redirty_tail_locked(inode, wb); } else if (inode->i_state & I_DIRTY_TIME) { inode->dirtied_when = jiffies; inode_io_list_move_locked(inode, wb, &wb->b_dirty_time); } } Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR