Re: [PATCH 2/5] xfs: ensuere deleting item from AIL after shutdown in dquot flush

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 27 Aug 2024 19:40:14 +1000

On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote:
> > Deleting items from the AIL before the log is shut down can result in the
> > log tail moving forward in the journal on disk because log writes can still
> > be taking place. As a result, items that have been deleted from the AIL
> > might not be recovered during the next mount, even though they should be,
> > as they were never written back to disk.
> > 
> > Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_dquot.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index c1b211c260a9..4cbe3db6fc32 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush(
> >  	return 0;
> >  
> >  out_abort:
> > +	/*
> > +	 * Shutdown first to stop the log before deleting items from the AIL.
> > +	 * Deleting items from the AIL before the log is shut down can result
> > +	 * in the log tail moving forward in the journal on disk because log
> > +	 * writes can still be taking place.
> > +	 */
> > +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> >  	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
> >  	xfs_trans_ail_delete(lip, 0);
> > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> 
> I see the logic in shutting down the log before letting go of the dquot
> log item that triggered the shutdown, but I wonder, why do we delete the
> item from the AIL?  AFAICT the inode items don't do that on iflush
> failure, but OTOH I couldn't figure out how the log items in the AIL get
> deleted from the AIL after a shutdown. 

Intents are removed from the AIL when the transaction containing
the deferred intent chain is cancelled instead of committed due the
log being shut down.

For everything else in the AIL, the ->iop_push method is supposed to
do any cleanup that is necessary by failing the item push and
running the item failure method itself.

For buffers, this is running IO completion as if an IO error
occurred. Error handling sees the shutdown and removes the item from
the AIL.

For inodes, xfs_iflush_cluster() fails the inode buffer as if an IO
error occurred, that then runs the individual inode abort code that
removes the inode items from the AIL.

For dquots, it has the ancient cleanup method that inodes used to
have. i.e. if the dquot has been flushed to the buffer, it is attached to
the buffer and then the buffer submission will fail and run IO
completion with an error. If the dquot hasn't been flushed to the
buffer because either it or the underlying dquot buffer is corrupt
it will remove the dquot from the AIL and then shut down the
filesystem.

It's the latter case that could be an issue. It's not the same as
the inode item case, because the tail pinning that the INODE_ALLOC
inode item type flag causes does not happen with dquots. There is
still a potential window where the dquot could be at the tail of the
log, and remocing it moves the tail forward at exactly the same time
the log tail is being sampled during a log write, and the shutdown
doesn't happen fast enough to prevent the log write going out to
disk.

To make timing of such a race even more unlikely, it would have to
race with a log write that contains a commit record, otherwise the
log tail lsn in the iclog will be ignored because it wasn't
contained within a complete checkpoint in the journal.  It's very
unlikely that a filesystem will read a corrupt dquot from disk at
exactly the same point in time these other journal pre-conditions
are met, but it could happen...

> Or maybe during a shutdown we just stop xfsaild and let the higher
> level objects free the log items during reclaim?

The AIL contains objects that have no references elsewhere in the
filesystem. It must be pushed until empty during unmount after a
shutdown to ensure that all the items in it have been pushed,
failed, removed from the AIL and freed...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx