On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote: > On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote: > > Deleting items from the AIL before the log is shut down can result in the > > log tail moving forward in the journal on disk because log writes can still > > be taking place. As a result, items that have been deleted from the AIL > > might not be recovered during the next mount, even though they should be, > > as they were never written back to disk. > > > > Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_dquot.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > > index c1b211c260a9..4cbe3db6fc32 100644 > > --- a/fs/xfs/xfs_dquot.c > > +++ b/fs/xfs/xfs_dquot.c > > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush( > > return 0; > > > > out_abort: > > + /* > > + * Shutdown first to stop the log before deleting items from the AIL. > > + * Deleting items from the AIL before the log is shut down can result > > + * in the log tail moving forward in the journal on disk because log > > + * writes can still be taking place. > > + */ > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > dqp->q_flags &= ~XFS_DQFLAG_DIRTY; > > xfs_trans_ail_delete(lip, 0); > > - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > I see the logic in shutting down the log before letting go of the dquot > log item that triggered the shutdown, but I wonder, why do we delete the > item from the AIL? AFAICT the inode items don't do that on iflush > failure, but OTOH I couldn't figure out how the log items in the AIL get > deleted from the AIL after a shutdown. Intents are removed from the AIL when the transaction containing the deferred intent chain is cancelled instead of committed due the log being shut down. For everything else in the AIL, the ->iop_push method is supposed to do any cleanup that is necessary by failing the item push and running the item failure method itself. For buffers, this is running IO completion as if an IO error occurred. Error handling sees the shutdown and removes the item from the AIL. For inodes, xfs_iflush_cluster() fails the inode buffer as if an IO error occurred, that then runs the individual inode abort code that removes the inode items from the AIL. For dquots, it has the ancient cleanup method that inodes used to have. i.e. if the dquot has been flushed to the buffer, it is attached to the buffer and then the buffer submission will fail and run IO completion with an error. If the dquot hasn't been flushed to the buffer because either it or the underlying dquot buffer is corrupt it will remove the dquot from the AIL and then shut down the filesystem. It's the latter case that could be an issue. It's not the same as the inode item case, because the tail pinning that the INODE_ALLOC inode item type flag causes does not happen with dquots. There is still a potential window where the dquot could be at the tail of the log, and remocing it moves the tail forward at exactly the same time the log tail is being sampled during a log write, and the shutdown doesn't happen fast enough to prevent the log write going out to disk. To make timing of such a race even more unlikely, it would have to race with a log write that contains a commit record, otherwise the log tail lsn in the iclog will be ignored because it wasn't contained within a complete checkpoint in the journal. It's very unlikely that a filesystem will read a corrupt dquot from disk at exactly the same point in time these other journal pre-conditions are met, but it could happen... > Or maybe during a shutdown we just stop xfsaild and let the higher > level objects free the log items during reclaim? The AIL contains objects that have no references elsewhere in the filesystem. It must be pushed until empty during unmount after a shutdown to ensure that all the items in it have been pushed, failed, removed from the AIL and freed... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx