Re: [RFC PATCH] xfs: Prevent umount from indefinitely waiting on XFS_IFLUSHING flag on stale inodes

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 2 Sep 2024 23:24:00 +1000

On Mon, Sep 02, 2024 at 01:20:41PM +0530, Chandan Babu R wrote:
> Executing xfs/057 can lead to an unmount task to wait indefinitely for
> XFS_IFLUSHING flag on some inodes to be cleared. The following timeline
> describes as to how inodes can get into such a state.
> 
>   Task A               Task B                      Iclog endio processing
>   ----------------------------------------------------------------------------
>   Inodes are freed
> 
>   Inodes items are
>   added to the CIL
> 
>   CIL contents are
>   written to iclog
> 
>   iclog->ic_fail_crc
>   is set to true
> 
>   iclog is submitted
>   for writing to the
>   disk
> 
>                        Last inode in the cluster
>                        buffer is freed
> 
>                        XFS_[ISTALE/IFLUSHING] is
>                        set on all inodes in the
>                        cluster buffer
> 
>                        XFS_STALE is set on
>                        the cluster buffer
>                                                    iclog crc error is detected
>                        ...                         during endio processing
>                        During xfs_trans_commit,    Set XFS_LI_ABORTED on inode
>                        log shutdown is detected    items
>                        on xfs_buf_log_item         - Unpin the inode since it
>                                                    is stale and return -1
>                        xfs_buf_log_item is freed

How do we get the buffer log item freed here? It should be in the
CIL and/or the AIL because the unlinked inode list updates should
have already logged directly to that buffer and committed in in
previous transactions. 

>                                                    Inode log items are not
>                        xfs_buf is not freed here   processed further since
>                        since b_hold has a          xfs_inode_item_committed()
>                        non-zero value              returns -1
>
> During normal operation, the stale inodes are processed by
> xfs_buf_item_unpin() => xfs_buf_inode_iodone(). This ends up calling
> xfs_iflush_abort() which in turn clears the XFS_IFLUSHING flag. However, in
> the case of this bug, the xfs_buf_log_item is freed just before the high level
> transaction is committed to the CIL.
>
> To overcome this bug, this commit removes the check for log shutdown during
> high level transaction commit operation. The log items in the high level
> transaction will now be committed to the CIL despite the log being
> shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work())
> to invoke xlog_cil_committed() as part of error handling. This will cause
> xfs_buf log item to to be unpinned and the corresponding inodes to be aborted
> and have their XFS_IFLUSHING flag cleared.

I don't know exactly how the problem arose, but I can say for
certain that the proposed fix is not valid.  Removing that specific
log shutdown check re-opens a race condition which can causes on
disk corruption. The shutdown was specifically placed to close that
race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must
check for log shutdown") for details.

I have no idea what the right way to fix this is yet, but removing
the shutdown check isn't it...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx