Re: Disk aligned (but not block aligned) DIO write woes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 04, 2021 at 10:06:01AM -0500, Brian Foster wrote:
> On Mon, Dec 28, 2020 at 05:57:29PM +0200, Avi Kivity wrote:
> > I observe that XFS takes an exclusive lock for DIO writes that are not block
> > aligned:
> > 
> > 
> > xfs_file_dio_aio_write(
> > 
> > {
> > 
> > ...
> > 
> >        /*
> >          * Don't take the exclusive iolock here unless the I/O is unaligned
> > to
> >          * the file system block size.  We don't need to consider the EOF
> >          * extension case here because xfs_file_aio_write_checks() will
> > relock
> >          * the inode as necessary for EOF zeroing cases and fill out the new
> >          * inode size as appropriate.
> >          */
> >         if ((iocb->ki_pos & mp->m_blockmask) ||
> >             ((iocb->ki_pos + count) & mp->m_blockmask)) {
> >                 unaligned_io = 1;
> > 
> >                 /*
> >                  * We can't properly handle unaligned direct I/O to reflink
> >                  * files yet, as we can't unshare a partial block.
> >                  */
> >                 if (xfs_is_cow_inode(ip)) {
> >                         trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos,
> > count);
> >                         return -ENOTBLK;
> >                 }
> >                 iolock = XFS_IOLOCK_EXCL;
> >         } else {
> >                 iolock = XFS_IOLOCK_SHARED;
> >         }
> > 
> > 
> > I also see that such writes cause io_submit to block, even if they hit a
> > written extent (and are also not size-changing, by implication) and
> > therefore do not require a metadata write. Probably due to "|| unaligned_io"
> > in
> > 
> > 
> >         ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> >                            &xfs_dio_write_ops,
> >                            is_sync_kiocb(iocb) || unaligned_io);
> > 
> > 
> > Can this be relaxed to allow writes to written extents to proceed in
> > parallel? I explain the motivation below.
> > 
> 
> From the above code, it looks as though we require the exclusive lock to
> accommodate sub-block zeroing that might occur down in the dio code.
> Without it, concurrent sector granular I/Os to the same block could
> presumably cause corruption if the data bios and associated zeroing bios
> were reordered between two requests.
> 
> Looking down in the iomap/dio code, iomap_dio_bio_actor() triggers
> sub-block zeroing if the associated range is unaligned and the mapping
> is either unwritten or new (i.e. newly allocated for this write). So as
> you note above, there is no such zeroing for a pre-existing written
> block within EOF. From that, it seems reasonable to me that in principle
> the filesystem could check for these conditions in the higher layer and
> further restrict usage of the exclusive lock.

Yuck, no.

> That said, we'd probably
> have to rework the above code to acquire the shared lock first,
> read/check the extent state for the start/end blocks of the I/O and then
> cycle out to the exclusive lock in the cases where it is required.

Yes, that requires mapping lookups above the iomap layer, then
repeating them again in the iomap layer and having to juggle the
locking to ensure the iomap code gets the same result.  This also
requires taking the ILOCK to decide how to lock the IOLOCK, which
inverts all the IO path locking in nasty, bug prone ways. We tried
very hard to get rid of all those layering violations and lock
dances in the IO path with iomap - the old DIO code was a never
ending source of locking bugs because we had to play locking games
like this....

>
> It's
> not immediately clear to me if we'd still want to set the wait flag in
> those unaligned-but-no-zeroing cases. Perhaps somebody who knows the dio
> code a bit better can chime in further on all of this..

As I explained quickly on #xfs over xmas/ny time - because it seems
like 4 or 5 people all asked for this at the same time - I suspect
we can do this with IOMAP_NOWAIT directives for unaligned IO under
shared locking.

That is, if the unaligned dio is going to require allocation or
sub-block zeroing or would otherwise block, the filesystem mapping
code returns EAGAIN to iomap_apply() which propagates back to the
filesystem submitting the DIO which can then take exclusive lock and
resubmit the unaligned DIO, this time without the nowait semantics.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux