On Wed, Apr 22, 2020 at 01:54:24PM -0400, Brian Foster wrote: > The log item failed flag is used to indicate a log item was flushed > but the underlying buffer was not successfully written to disk. If > the error configuration allows for writeback retries, xfsaild uses > the flag to resubmit the underlying buffer without acquiring the > flush lock of the item. > > The flag is currently set and cleared under the AIL lock and buffer > lock in the ->iop_error() callback, invoked via ->b_iodone() at I/O > completion time. The flag is checked at xfsaild push time under AIL > lock and cleared under buffer lock before resubmission. If I/O > eventually succeeds, the flag is cleared in the _done() handler for > the associated item type, again under AIL lock and buffer lock. Actually, I think you've missed the relevant lock here: the item's flush lock. The XFS_LI_FAILED flag is item flush state, and flush state is protected by the flush lock. When the item has been flushed and attached to the buffer for completion callbacks, the flush lock context gets handed to the buffer. i.e. the buffer owns the flush lock and so while that buffer is locked for IO we know that the item flush state (and hence the XFS_LI_FAILED flag) is exclusively owned by the holder of the buffer lock. (Note: this is how xfs_ifree_cluster() works - it grabs the buffer lock then walks the items on the buffer and changes the callback functions because those items are flush locked and hence holding the buffer lock gives exclusive access to the flush state of those items....) > As far as I can tell, the only reason for holding the AIL lock > across sets/clears is to manage consistency between the log item > bitop state and the temporary buffer pointer that is attached to the > log item. The bit itself is used to manage consistency of the > attached buffer, but is not enough to guarantee the buffer is still > attached by the time xfsaild attempts to access it. Correct. The guarantee that the buffer is still attached to the log item is what the AIL lock provides us with. > However since > failure state is always set or cleared under buffer lock (either via > I/O completion or xfsaild), this particular case can be handled at > item push time by verifying failure state once under buffer lock. In theory, yes, but there's a problem before you get that buffer lock. That is: what serialises access to lip->li_buf? The xfsaild does not hold a reference to the buffer and, without the AIL lock to provide synchronisation, the log item reference to the buffer can be dropped asynchronously by buffer IO completion. Hence the buffer could be freed between the xfsaild reading lip->li_buf and calling xfs_buf_trylock(bp). i.e. this introduces a use-after-free race condition on the initial buffer access. IOWs, the xfsaild cannot access lip->li_buf safely unless the set/clear operations are protected by the same lock the xfsaild holds. The xfsaild holds neither the buffer lock, a buffer reference or an item flush lock, hence it's the AIL lock or nothing... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx