On Wed, Jun 21, 2023 at 12:05:32PM +1000, Dave Chinner wrote: > On Thu, May 25, 2023 at 05:45:35PM -0700, Darrick J. Wong wrote: > > @@ -2112,6 +2112,37 @@ xfs_buf_delwri_queue( > > return true; > > } > > > > +/* > > + * Queue a buffer to this delwri list as part of a data integrity operation. > > + * If the buffer is on any other delwri list, we'll wait for that to clear > > + * so that the caller can submit the buffer for IO and wait for the result. > > + * Callers must ensure the buffer is not already on the list. > > + */ > > +void > > +xfs_buf_delwri_queue_here( > > This is more of an "exclusive" queuing semantic. i.e. queue this > buffer exclusively on the list provided, rather than just ensuring > it is queued on some delwri list.... > > > + struct xfs_buf *bp, > > + struct list_head *buffer_list) > > +{ > > + /* > > + * We need this buffer to end up on the /caller's/ delwri list, not any > > + * old list. This can happen if the buffer is marked stale (which > > + * clears DELWRI_Q) after the AIL queues the buffer to its list but > > + * before the AIL has a chance to submit the list. > > + */ > > + while (!list_empty(&bp->b_list)) { > > + xfs_buf_unlock(bp); > > + delay(1); > > + xfs_buf_lock(bp); > > + } > > Not a big fan of this as it the buffer can be on the AIL buffer list > for some time (e.g. AIL might have a hundred thousand buffers to > push). > > This seems more like a case for: > > while (!list_empty(&bp->b_list)) { > xfs_buf_unlock(bp); > wait_event_var(bp->b_flags, !(bp->b_flags & _XBF_DELWRI_Q)); > xfs_buf_lock(bp); > } > > And a wrapper: > > void xfs_buf_remove_delwri( > struct xfs_buf *bp) > { > list_del(&bp->b_list); > bp->b_flags &= ~_XBF_DELWRI_Q; > wake_up_var(bp->b_flags); > } > > And we replace all the places where the buffer is taken off the > delwri list with calls to xfs_buf_remove_delwri()... > > This will greatly reduce the number of context switches during a > wait cycle, and reduce the latency of waiting for buffers that are > queued for delwri... The thing is, we're really waiting for the buffer to clear /all/ delwri-related lists. This could be the actual buffer list, but it could also be the xfs_buf_delwri_submit's wait_list. It's not sufficient for repair to allow some other (probably the AIL) thread to write the new structure's buffer because that caller could see an EIO/ENOSPC error, and repair needs to return the specific condition to the caller. That said, I suppose we could spring a wait_var_event on bp->b_list.next to look for list_empty(&bp->b_list). I'll try that out tonight. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx