The special processing used to simulate a buffer I/O failure on fs shutdown has a difficult to reproduce race that can result in a use after free of the associated buffer. Consider a buffer that has been committed to the on-disk log and thus is AIL resident. The buffer lands on the writeback delwri queue, but is subsequently locked, committed and pinned by another transaction before submitted for I/O. At this point, the buffer is stuck on the delwri queue as it cannot be submitted for I/O until it is unpinned. A log checkpoint I/O failure occurs sometime later, which aborts the bli. The unpin handler is called with the aborted log item, drops the bli reference count, the pin count, and falls into the I/O failure simulation path. The potential problem here is that once the pin count falls to zero in ->iop_unpin(), xfsaild is free to retry delwri submission of the buffer at any time, before the unpin handler even completes. If delwri queue submission wins the race to the buffer lock, it observes the shutdown state and simulates the I/O failure itself. This releases both the bli and delwri queue holds and frees the buffer while xfs_buf_item_unpin() sits on xfs_buf_lock() waiting to run through the same failure sequence. This problem is rare and requires many iterations of fstest generic/019 (which simulates disk I/O failures) to reproduce. To avoid this problem, hold the buffer across the unpin sequence in xfs_buf_item_unpin(). This is a bit unfortunate in that the new hold is unconditional while really only necessary for a rare, fatal error scenario, but it guarantees the buffer still exists in the off chance that the handler attempts to access it. Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> --- This is a patch I've had around for a bit for a very rare corner case I was able to reproduce in some past testing. I'm sending this as RFC because I'm curious if folks have any thoughts on the approach. I'd be Ok with this change as is, but I think there are alternatives available too. We could do something fairly simple like bury the hold in the remove (abort) case only, or perhaps consider checking IN_AIL state before the pin count drops and base on that (though that seems a bit more fragile to me). Thoughts? Brian fs/xfs/xfs_buf_item.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index fb69879e4b2b..a1ad6901eb15 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -504,6 +504,7 @@ xfs_buf_item_unpin( freed = atomic_dec_and_test(&bip->bli_refcount); + xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters); @@ -560,6 +561,7 @@ xfs_buf_item_unpin( bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); } + xfs_buf_rele(bp); } STATIC uint -- 2.26.3