On Tue, Mar 29, 2022 at 10:13:15AM +1100, Dave Chinner wrote: > On Mon, Mar 28, 2022 at 04:05:31PM -0700, Darrick J. Wong wrote: > > On Thu, Mar 24, 2022 at 11:21:00AM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Brian reported a null pointer dereference failure during unmount in > > > xfs/006. He tracked the problem down to the AIL being torn down > > > before a log shutdown had completed and removed all the items from > > > the AIL. The failure occurred in this path while unmount was > > > proceeding in another task: > > > > > > xfs_trans_ail_delete+0x102/0x130 [xfs] > > > xfs_buf_item_done+0x22/0x30 [xfs] > > > xfs_buf_ioend+0x73/0x4d0 [xfs] > > > xfs_trans_committed_bulk+0x17e/0x2f0 [xfs] > > > xlog_cil_committed+0x2a9/0x300 [xfs] > > > xlog_cil_process_committed+0x69/0x80 [xfs] > > > xlog_state_shutdown_callbacks+0xce/0xf0 [xfs] > > > xlog_force_shutdown+0xdf/0x150 [xfs] > > > xfs_do_force_shutdown+0x5f/0x150 [xfs] > > > xlog_ioend_work+0x71/0x80 [xfs] > > > process_one_work+0x1c5/0x390 > > > worker_thread+0x30/0x350 > > > kthread+0xd7/0x100 > > > ret_from_fork+0x1f/0x30 > > > > > > This is processing an EIO error to a log write, and it's > > > triggering a force shutdown. This causes the log to be shut down, > > > and then it is running attached iclog callbacks from the shutdown > > > context. That means the fs and log has already been marked as > > > xfs_is_shutdown/xlog_is_shutdown and so high level code will abort > > > (e.g. xfs_trans_commit(), xfs_log_force(), etc) with an error > > > because of shutdown. > > > > > > The umount would have been blocked waiting for a log force > > > completion inside xfs_log_cover() -> xfs_sync_sb(). The first thing > > > for this situation to occur is for xfs_sync_sb() to exit without > > > waiting for the iclog buffer to be comitted to disk. The > > > above trace is the completion routine for the iclog buffer, and > > > it is shutting down the filesystem. > > > > > > xlog_state_shutdown_callbacks() does this: > > > > > > { > > > struct xlog_in_core *iclog; > > > LIST_HEAD(cb_list); > > > > > > spin_lock(&log->l_icloglock); > > > iclog = log->l_iclog; > > > do { > > > if (atomic_read(&iclog->ic_refcnt)) { > > > /* Reference holder will re-run iclog callbacks. */ > > > continue; > > > } > > > list_splice_init(&iclog->ic_callbacks, &cb_list); > > > >>>>>> wake_up_all(&iclog->ic_write_wait); > > > >>>>>> wake_up_all(&iclog->ic_force_wait); > > > } while ((iclog = iclog->ic_next) != log->l_iclog); > > > > > > wake_up_all(&log->l_flush_wait); > > > spin_unlock(&log->l_icloglock); > > > > > > >>>>>> xlog_cil_process_committed(&cb_list); > > > } > > > > > > It wakes forces waiters before shutdown processes all the pending > > > callbacks. > > > > I'm not sure what this means. > > "It wakes force waiters" i.e. any process in xfs_log_force() waiting > on iclog->ic_force_wait... Ahh, 'forces' is not part of the verb in that sentence. Would you mind rewording that sentence to: "It wakes any thread waiting on IO completion of the iclog (in this case the umount log force) before shutdown processes all the pending callbacks." With that change, I think I understand this now. Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > > Are you saying that log shutdown wakes up iclog waiters before it > > processes pending callbacks? And then anyone who waits on an iclog (log > > forces, I guess?) will wake up and race with the callbacks? > > Yes, exactly that. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx