Re: [PATCH v2 2/2] xfs: use sync buffer I/O for sync delwri queue submission

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 13, 2018 at 07:29:54PM -0400, Brian Foster wrote:
> On Thu, Jun 14, 2018 at 08:08:07AM +1000, Dave Chinner wrote:
> > On Wed, Jun 13, 2018 at 07:05:16AM -0400, Brian Foster wrote:
> > > If a delwri queue occurs of a buffer that sits on a delwri queue
> > > wait list, the queue sets _XBF_DELWRI_Q without changing the state
> > > of ->b_list. This occurs, for example, if another thread beats the
> > > current delwri waiter thread to the buffer lock after I/O
> > > completion. Once the waiter acquires the lock, it removes the buffer
> > > from the wait list and leaves a buffer with _XBF_DELWRI_Q set but
> > > not populated on a list. This results in a lost buffer submission
> > > and in turn can result in assert failures due to _XBF_DELWRI_Q being
> > > set on buffer reclaim or filesystem lockups if the buffer happens to
> > > cover an item in the AIL.
> > > 
> > > This problem has been reproduced by repeated iterations of xfs/305
> > > on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty
> > > dquot reclaim races with an xfsaild push of a separate dquot backed
> > > by the same buffer such that the buffer sits on the reclaim wait
> > > list at the time xfsaild attempts to queue it. Since the latter
> > > dquot has been flush locked but the underlying buffer not submitted
> > > for I/O, the dquot pins the AIL and causes the filesystem to
> > > livelock.
> > > 
> > > This race is essentially made possible by the buffer lock cycle
> > > involved with waiting on a synchronous delwri queue submission.
> > > Close the race by using synchronous buffer I/O for respective delwri
> > > queue submission. This means the buffer remains locked across the
> > > I/O and so is inaccessible from other contexts while in the
> > > intermediate wait list state. The sync buffer I/O wait mechanism is
> > > factored into a helper such that sync delwri buffer submission and
> > > serialization are batched operations.
> > > 
> > > Designed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > ---
> > 
> > Just something I noticed on a initial brief scan:
> > 
> > > @@ -2013,21 +2017,22 @@ xfs_buf_delwri_submit_buffers(
> > >  		trace_xfs_buf_delwri_split(bp, _RET_IP_);
> > >  
> > >  		/*
> > > -		 * We do all IO submission async. This means if we need
> > > -		 * to wait for IO completion we need to take an extra
> > > -		 * reference so the buffer is still valid on the other
> > > -		 * side. We need to move the buffer onto the io_list
> > > -		 * at this point so the caller can still access it.
> > > +		 * If we have a wait list, each buffer (and associated delwri
> > > +		 * queue reference) transfers to it and is submitted
> > > +		 * synchronously. Otherwise, drop the buffer from the delwri
> > > +		 * queue and submit async.
> > >  		 */
> > >  		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
> > > -		bp->b_flags |= XBF_WRITE | XBF_ASYNC;
> > > +		bp->b_flags |= XBF_WRITE;
> > >  		if (wait_list) {
> > > -			xfs_buf_hold(bp);
> > > +			bp->b_flags &= ~XBF_ASYNC;
> > >  			list_move_tail(&bp->b_list, wait_list);
> > > -		} else
> > > +			__xfs_buf_submit(bp);
> > 
> > We lose a buffer submission tracepoint here.
> > 
> 
> Yeah, good catch. What do you think about just killing
> trace_xfs_buf_submit_wait() and pushing trace_xfs_buf_submit() down into
> the new helper? It looks like we can already distinguish the io type
> based on ->b_flags.

That's fine by me.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux