On Thu, Nov 03, 2022 at 04:36:32PM +0800, Guo Xuenan wrote: > xfs log io error will trigger xlog shut down, and end_io worker call > shutdown_callbacks to unpin and release the buf log item. The race > condition is that when there are some thread doing transaction commit > and happened not to be intercepted by xlog_is_shutdown, then, these > log item will be insert into CIL, when unpin and release these buf log > item, UAF will occur. BTW, add delay before `xlog_cil_commit` while xlog > shutdown status can increase recurrence probability. > > The following call graph actually encountered this bad situation. > fsstress io end worker kworker/0:1H-216 > xlog_ioend_work > ->xlog_force_shutdown > ->xlog_state_shutdown_callbacks > ->xlog_cil_process_committed > ->xlog_cil_committed > ->xfs_trans_committed_bulk > ->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1); > ->xfs_trans_getsb > ->_xfs_trans_bjoin > ->xfs_buf_item_init > ->if (bip) { return 0;} //relog _xfs_trans_bjoin() takes a reference to the bli. > ->xlog_cil_commit > ->xlog_cil_insert_items //insert into CIL > ->xfs_buf_ioend_fail(bp); > ->xfs_buf_ioend > ->xfs_buf_item_done > ->xfs_buf_item_relse > ->xfs_buf_item_free So how is the bli getting freed here if the fstress task has just taken an extra reference and inserted it into the CIL? Ah, the problem is that xfs_buf_item_relse() isn't checking the reference count before it frees the BLI. That is, xfs_buf_item_relse() assumes that it is only called at the end of the BLI life cycle and so doesn't check the reference count, when in fact it clearly isn't. Also, should we be inserting new items into the CIL if the log is already marked as shut down? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx