On Wed, Jun 13, 2018 at 07:05:16AM -0400, Brian Foster wrote: > If a delwri queue occurs of a buffer that sits on a delwri queue > wait list, the queue sets _XBF_DELWRI_Q without changing the state > of ->b_list. This occurs, for example, if another thread beats the > current delwri waiter thread to the buffer lock after I/O > completion. Once the waiter acquires the lock, it removes the buffer > from the wait list and leaves a buffer with _XBF_DELWRI_Q set but > not populated on a list. This results in a lost buffer submission > and in turn can result in assert failures due to _XBF_DELWRI_Q being > set on buffer reclaim or filesystem lockups if the buffer happens to > cover an item in the AIL. > > This problem has been reproduced by repeated iterations of xfs/305 > on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty > dquot reclaim races with an xfsaild push of a separate dquot backed > by the same buffer such that the buffer sits on the reclaim wait > list at the time xfsaild attempts to queue it. Since the latter > dquot has been flush locked but the underlying buffer not submitted > for I/O, the dquot pins the AIL and causes the filesystem to > livelock. > > This race is essentially made possible by the buffer lock cycle > involved with waiting on a synchronous delwri queue submission. > Close the race by using synchronous buffer I/O for respective delwri > queue submission. This means the buffer remains locked across the > I/O and so is inaccessible from other contexts while in the > intermediate wait list state. The sync buffer I/O wait mechanism is > factored into a helper such that sync delwri buffer submission and > serialization are batched operations. > > Designed-by: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/xfs_buf.c | 78 +++++++++++++++++++++++++----------------------- > 1 file changed, 41 insertions(+), 37 deletions(-) > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 112999ddb75e..113ab6426a40 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -1530,6 +1530,21 @@ xfs_buf_submit( > xfs_buf_rele(bp); > } > > +/* > + * Wait on a sync buffer. > + */ > +static int > +xfs_buf_iowait( > + struct xfs_buf *bp) > +{ > + /* wait for completion before gathering the error from the buffer */ That comments seems to state the obvious based on the function name and top of function comment. I'd just remove it. > @@ -2013,21 +2017,22 @@ xfs_buf_delwri_submit_buffers( > trace_xfs_buf_delwri_split(bp, _RET_IP_); > > /* > + * If we have a wait list, each buffer (and associated delwri > + * queue reference) transfers to it and is submitted > + * synchronously. Otherwise, drop the buffer from the delwri > + * queue and submit async. > */ > bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL); > + bp->b_flags |= XBF_WRITE; > if (wait_list) { > + bp->b_flags &= ~XBF_ASYNC; > list_move_tail(&bp->b_list, wait_list); > + __xfs_buf_submit(bp); > + } else { > + bp->b_flags |= XBF_ASYNC; > list_del_init(&bp->b_list); > + xfs_buf_submit(bp); > + } Ok, that breaks my idea of just checking XBF_ASYNC in the previous reply. But we could still do that with an explicit flag instead of the duplication. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html