Re: [PATCH 2/2] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 28, 2022 at 06:53:32PM +0200, Jan Kara wrote:
> On Thu 28-07-22 15:39:14, Lukas Czerner wrote:
> > Currently the I_DIRTY_TIME will never get set if the inode already has
> > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> > true, however ext4 will only update the on-disk inode in
> > ->dirty_inode(), not on actual writeback. As a result if the inode
> > already has I_DIRTY_INODE state by the time we get to
> > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> > into on-disk inode and will not get updated until the next I_DIRTY_INODE
> > update, which might never come if we crash or get a power failure.
> > 
> > The problem can be reproduced on ext4 by running xfstest generic/622
> > with -o iversion mount option. Fix it by setting I_DIRTY_TIME even if
> > the inode already has I_DIRTY_INODE.

Hi Jan,

thanks for th review.

> 
> As a datapoint I've checked and XFS has the very same problem as ext4.

Very interesting, I did look at xfs when I was debugging this problem
and wans't able to tell whether they have the same problem or not, but
it certainly can't be reproduced by generic/622. Or at least I can't
reproduce it on XFS.

So I wonder what is XFS doing differently in that case.

> 
> > Also clear the I_DIRTY_TIME after ->dirty_inode() otherwise it may never
> > get cleared.
> > 
> > Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx>
> > ---
> >  fs/fs-writeback.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 05221366a16d..174f01e6b912 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2383,6 +2383,11 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> >  
> >  		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
> >  		flags &= ~I_DIRTY_TIME;
> > +		if (inode->i_state & I_DIRTY_TIME) {
> > +			spin_lock(&inode->i_lock);
> > +			inode->i_state &= ~I_DIRTY_TIME;
> > +			spin_unlock(&inode->i_lock);
> > +		}
> 
> Hum, so this is a bit dangerous because inode->i_state may be inconsistent
> with the writeback list inode is queued in (wb->b_dirty_time) and these two
> are supposed to be in sync. So I rather think we need to make sure we go
> through the full round of 'update flags and writeback list' below in case
> we need to clear I_DIRTY_TIME from inode->i_state.

Ok, so we're clearing I_DIRTY_TIME in __ext4_update_other_inode_time()
which will opportunistically update the time fields for inodes in the
same block as the inode we're doing an update for via
ext4_do_update_inode(). Don't we also need to rewire that differently?

XFS is also clearing it on it's own in log code, but I can't tell if it
has the same problem as you describe here.

> 
> >  	} else {
> >  		/*
> >  		 * Else it's either I_DIRTY_PAGES, I_DIRTY_TIME, or nothing.
> > @@ -2399,13 +2404,20 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> >  	 */
> >  	smp_mb();
> >  
> > -	if (((inode->i_state & flags) == flags) ||
> > -	    (dirtytime && (inode->i_state & I_DIRTY_INODE)))
> > +	if ((inode->i_state & flags) == flags)
> >  		return;
> >  
> >  	spin_lock(&inode->i_lock);
> > -	if (dirtytime && (inode->i_state & I_DIRTY_INODE))
> > +	if (dirtytime && (inode->i_state & I_DIRTY_INODE)) {
> > +		/*
> > +		 * We've got a new lazytime update. Make sure it's recorded in
> > +		 * i_state, because the time might have already got updated in
> > +		 * ->dirty_inode() and will not get updated until next
> > +		 *  I_DIRTY_INODE update.
> > +		 */
> > +		inode->i_state |= I_DIRTY_TIME;
> >  		goto out_unlock_inode;
> > +	}
> 
> So I'm afraid this combination is not properly handled in
> writeback_single_inode() where we have at the end:
> 
>         if (!(inode->i_state & I_DIRTY_ALL))
>                 inode_cgwb_move_to_attached(inode, wb);
>         else if (!(inode->i_state & I_SYNC_QUEUED) &&
>                  (inode->i_state & I_DIRTY))
>                 redirty_tail_locked(inode, wb);
> 
> So inode that had I_DIRTY_SYNC | I_DIRTY_TIME will not be properly refiled
> to wb->b_dirty_time list after writeback was done and I_DIRTY_SYNC got
> cleared.
> 
> So we need to refine it to something like:
> 
> 	if (!(inode->i_state & I_DIRTY_ALL))
> 		inode_cgwb_move_to_attached(inode, wb);
> 	else if (!(inode->i_state & I_SYNC_QUEUED)) {
> 		if (inode->i_state & I_DIRTY) {
> 			redirty_tail_locked(inode, wb);
> 		} else if (inode->i_state & I_DIRTY_TIME) {
> 			inode->dirtied_when = jiffies;
> 			inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
> 		}
> 	}

Very nice, thanks, I'll have a look.

-Lukas

> 
> 								Honza
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux