Re: [PATCH v2 08/13] xfs: elide the AIL lock on log item failure tracking

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 23 Apr 2020 15:59:19 +1000

On Wed, Apr 22, 2020 at 01:54:24PM -0400, Brian Foster wrote:
> The log item failed flag is used to indicate a log item was flushed
> but the underlying buffer was not successfully written to disk. If
> the error configuration allows for writeback retries, xfsaild uses
> the flag to resubmit the underlying buffer without acquiring the
> flush lock of the item.
> 
> The flag is currently set and cleared under the AIL lock and buffer
> lock in the ->iop_error() callback, invoked via ->b_iodone() at I/O
> completion time. The flag is checked at xfsaild push time under AIL
> lock and cleared under buffer lock before resubmission. If I/O
> eventually succeeds, the flag is cleared in the _done() handler for
> the associated item type, again under AIL lock and buffer lock.

Actually, I think you've missed the relevant lock here: the item's
flush lock. The XFS_LI_FAILED flag is item flush state, and flush
state is protected by the flush lock. When the item has been flushed
and attached to the buffer for completion callbacks, the flush lock
context gets handed to the buffer.

i.e. the buffer owns the flush lock and so while that buffer is
locked for IO we know that the item flush state (and hence the
XFS_LI_FAILED flag) is exclusively owned by the holder of the buffer
lock.

(Note: this is how xfs_ifree_cluster() works - it grabs the buffer
lock then walks the items on the buffer and changes the callback
functions because those items are flush locked and hence holding the
buffer lock gives exclusive access to the flush state of those
items....)

> As far as I can tell, the only reason for holding the AIL lock
> across sets/clears is to manage consistency between the log item
> bitop state and the temporary buffer pointer that is attached to the
> log item. The bit itself is used to manage consistency of the
> attached buffer, but is not enough to guarantee the buffer is still
> attached by the time xfsaild attempts to access it.

Correct. The guarantee that the buffer is still attached to the log
item is what the AIL lock provides us with.

> However since
> failure state is always set or cleared under buffer lock (either via
> I/O completion or xfsaild), this particular case can be handled at
> item push time by verifying failure state once under buffer lock.

In theory, yes, but there's a problem before you get that buffer
lock. That is: what serialises access to lip->li_buf?

The xfsaild does not hold a reference to the buffer and, without the
AIL lock to provide synchronisation, the log item reference to the
buffer can be dropped asynchronously by buffer IO completion. Hence
the buffer could be freed between the xfsaild reading lip->li_buf
and calling xfs_buf_trylock(bp). i.e. this introduces a
use-after-free race condition on the initial buffer access.

IOWs, the xfsaild cannot access lip->li_buf safely unless the
set/clear operations are protected by the same lock the xfsaild
holds. The xfsaild holds neither the buffer lock, a buffer reference
or an item flush lock, hence it's the AIL lock or nothing...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx