On Mon, Sep 02, 2024 at 01:20:41PM +0530, Chandan Babu R wrote: > Executing xfs/057 can lead to an unmount task to wait indefinitely for > XFS_IFLUSHING flag on some inodes to be cleared. The following timeline > describes as to how inodes can get into such a state. > > Task A Task B Iclog endio processing > ---------------------------------------------------------------------------- > Inodes are freed > > Inodes items are > added to the CIL > > CIL contents are > written to iclog > > iclog->ic_fail_crc > is set to true > > iclog is submitted > for writing to the > disk > > Last inode in the cluster > buffer is freed > > XFS_[ISTALE/IFLUSHING] is > set on all inodes in the > cluster buffer > > XFS_STALE is set on > the cluster buffer > iclog crc error is detected > ... during endio processing > During xfs_trans_commit, Set XFS_LI_ABORTED on inode > log shutdown is detected items > on xfs_buf_log_item - Unpin the inode since it > is stale and return -1 > xfs_buf_log_item is freed How do we get the buffer log item freed here? It should be in the CIL and/or the AIL because the unlinked inode list updates should have already logged directly to that buffer and committed in in previous transactions. > Inode log items are not > xfs_buf is not freed here processed further since > since b_hold has a xfs_inode_item_committed() > non-zero value returns -1 > > During normal operation, the stale inodes are processed by > xfs_buf_item_unpin() => xfs_buf_inode_iodone(). This ends up calling > xfs_iflush_abort() which in turn clears the XFS_IFLUSHING flag. However, in > the case of this bug, the xfs_buf_log_item is freed just before the high level > transaction is committed to the CIL. > > To overcome this bug, this commit removes the check for log shutdown during > high level transaction commit operation. The log items in the high level > transaction will now be committed to the CIL despite the log being > shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) > to invoke xlog_cil_committed() as part of error handling. This will cause > xfs_buf log item to to be unpinned and the corresponding inodes to be aborted > and have their XFS_IFLUSHING flag cleared. I don't know exactly how the problem arose, but I can say for certain that the proposed fix is not valid. Removing that specific log shutdown check re-opens a race condition which can causes on disk corruption. The shutdown was specifically placed to close that race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must check for log shutdown") for details. I have no idea what the right way to fix this is yet, but removing the shutdown check isn't it... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx