Re: [PATCH] xfs: don't take a spinlock unconditionally in the DIO fastpath

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 25 May 2021 17:18:21 +1000

On Thu, May 20, 2021 at 04:33:32PM -0700, Darrick J. Wong wrote:
> On Wed, May 19, 2021 at 11:19:20AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Because this happens at high thread counts on high IOPS devices
> > doing mixed read/write AIO-DIO to a single file at about a million
> > iops:
> > 
> >    64.09%     0.21%  [kernel]            [k] io_submit_one
> >    - 63.87% io_submit_one
> >       - 44.33% aio_write
> >          - 42.70% xfs_file_write_iter
> >             - 41.32% xfs_file_dio_write_aligned
> >                - 25.51% xfs_file_write_checks
> >                   - 21.60% _raw_spin_lock
> >                      - 21.59% do_raw_spin_lock
> >                         - 19.70% __pv_queued_spin_lock_slowpath
> > 
> > This also happens of the IO completion IO path:
> > 
> >    22.89%     0.69%  [kernel]            [k] xfs_dio_write_end_io
> >    - 22.49% xfs_dio_write_end_io
> >       - 21.79% _raw_spin_lock
> >          - 20.97% do_raw_spin_lock
> >             - 20.10% __pv_queued_spin_lock_slowpath                                                                                                            ▒
> 
> Super long line there.

Ah, forgot to trim it.

> > @@ -500,7 +510,17 @@ xfs_dio_write_end_io(
> >  	 * other IO completions here to update the EOF. Failing to serialise
> >  	 * here can result in EOF moving backwards and Bad Things Happen when
> >  	 * that occurs.
> > +	 *
> > +	 * As IO completion only ever extends EOF, we can do an unlocked check
> > +	 * here to avoid taking the spinlock. If we land within the current EOF,
> > +	 * then we do not need to do an extending update at all, and we don't
> > +	 * need to take the lock to check this. If we race with an update moving
> > +	 * EOF, then we'll either still be beyond EOF and need to take the lock,
> > +	 * or we'll be within EOF and we don't need to take it at all.
> 
> Is truncate locked out at this point too?  I /think/ it is since we
> still hold the iolock (shared or excl) which blocks truncate?

truncate and fallocate are locked out because the inode dio count is
still elevated at this point. i.e. they'll block in inode_dio_wait()
until we return to iomap_dio_complete() and it (eventually) calls
inode_dio_end()....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx