On Wed, Sep 18, 2024 at 11:20:12 AM +1000, Dave Chinner wrote: > On Mon, Sep 16, 2024 at 11:14:32AM +0530, Chandan Babu R wrote: >> On Thu, Sep 05, 2024 at 06:12:29 PM +0530, Chandan Babu R wrote: >> >>> To overcome this bug, this commit removes the check for log shutdown during >> >>> high level transaction commit operation. The log items in the high level >> >>> transaction will now be committed to the CIL despite the log being >> >>> shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) >> >>> to invoke xlog_cil_committed() as part of error handling. This will cause >> >>> xfs_buf log item to to be unpinned and the corresponding inodes to be aborted >> >>> and have their XFS_IFLUSHING flag cleared. >> >> >> >> I don't know exactly how the problem arose, but I can say for >> >> certain that the proposed fix is not valid. Removing that specific >> >> log shutdown check re-opens a race condition which can causes on >> >> disk corruption. The shutdown was specifically placed to close that >> >> race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must >> >> check for log shutdown") for details. >> >> >> >> I have no idea what the right way to fix this is yet, but removing >> >> the shutdown check isn't it... >> >> >> >> Commit 3c4cb76bce43 describes the following scenario, >> >> 1. Filesystem is shutdown but the log remains operational. >> 2. High-level transaction commit (i.e. xfs_trans_commit()) notices the fs >> shutdown. Hence it aborts the dirty log items. One of the log items being >> aborted is an inode log item. >> 3. An inode cluster writeback is executed. Here, we come across the previously >> aborted inode log item. The inode log item is currently unpinned and >> dirty. Hence, the inode is included in the cluster buffer writeback. >> 4. Cluster buffer IO completion tries to remove the inode log item from the >> AIL and hence trips over an assert statement since the log item was never >> on the AIL. This indicates that the inode was never written to the journal. >> >> Hence the commit 3c4cb76bce43 will abort the transaction commit only when the >> log has been shutdown. >> >> With the log shutdown check removed, we can end up with the following cases >> during high-level transaction commit operation, >> 1. The filesystem is shutdown while the log remains operational. >> In this case, the log items are committed to the CIL where they are pinned >> before unlocking them. >> This should prevent the inode cluster writeback code >> from including such an inode for writeback since the corresponding log item >> is pinned. From here onwards, the normal flow of log items from the CIL to >> the AIL occurs after the contents of the log items are written to the >> journal and then later unpinned. >> The above logic holds true even without applying the "RFC patch" presented >> in the mail. >> >> 2. The log is shutdown. >> As in the previous case, the log items are moved to the CIL where they are >> pinned before unlocking them. The pinning of the log items prevents >> the inode cluster writeback code from including the pinned inode in its >> writeback operation. These log items are then processed by >> xlog_cil_committed() which gets invoked as part of error handling by >> xlog_cil_push_work(). > > It's more complex than that. > > 3. We get an error returned from xfs_defer_finish_noroll() or > xfs_trans_run_precommits(). > Commit b5f17bec1213a3ed2f4d79ad4c566e00cabe2a9b has the following, 2. xfs_force_shutdown() is used in places to cause the current modification to be aborted via xfs_trans_commit() because it may be impractical or impossible to cancel the transaction directly, and hence xfs_trans_commit() must cancel transactions when xfs_is_shutdown() is true in this situation. But we can't do that because of #1. I see that xfs_trans_run_precommits() and xfs_defer_finish_noroll() both call xfs_force_shutdown() when they encounter an error. But these functions could have percolated the error code to __xfs_trans_commit() and let __xfs_trans_commit() deal with the scenario more gracefully rather than checking for log shutdown. Can you please explain why this is impractical or impossible? -- Chandan