On Tue, Jan 12, 2021 at 12:07:45PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > The unaligned DIO write path is more convulted than the normal path, s/convulted/convoluted/ > > and we are about to make it more complex. Keep the block aligned > fast path dio write code trim and simple by splitting out the > unaligned DIO code from it. I like this, but a few comments below: > /* > + * Handle block aligned direct IO writes > * > * Lock the inode appropriately to prepare for and issue a direct IO write. > * By separating it from the buffered write path we remove all the tricky to > @@ -518,35 +518,88 @@ static const struct iomap_dio_ops xfs_dio_write_ops = { > * until we're sure the bytes at the new EOF have been zeroed and/or the cached > * pages are flushed out. > * > + * Returns with locks held indicated by @iolock and errors indicated by > + * negative return values. As far as I can tell no locks are held when returning from xfs_file_dio_write_aligned. > + */ > +STATIC ssize_t > +xfs_file_dio_write_aligned( I thought we got rid of STATIC for new code? > + /* > + * No fallback to buffered IO after short writes for XFS, direct I/O > + * will either complete fully or return an error. > + */ > + ASSERT(ret < 0 || ret == count); Maybe it is time to drop this assert rather than duplicating it given that iomap direct I/O has behaved sane in this regard from the start? > + * To provide the same serialisation for AIO, we also need to wait for > * outstanding IOs to complete so that unwritten extent conversion is completed > * before we try to map the overlapping block. This is currently implemented by > * hitting it with a big hammer (i.e. inode_dio_wait()). > * > + * This means that unaligned dio writes alwys block. There is no "nowait" fast s/alwys/always/ > + /* > + * We can't properly handle unaligned direct I/O to reflink > + * files yet, as we can't unshare a partial block. > + */ FYI, this could use up the whole 80 chars. > + /* > + * Don't take the exclusive iolock here unless the I/O is unaligned to > + * the file system block size. We don't need to consider the EOF > + * extension case here because xfs_file_aio_write_checks() will relock > + * the inode as necessary for EOF zeroing cases and fill out the new > + * inode size as appropriate. > + */ > + if ((iocb->ki_pos | count) & mp->m_blockmask) > + return xfs_file_dio_write_unaligned(ip, iocb, from); > + return xfs_file_dio_write_aligned(ip, iocb, from); I don't think that whole comment makes much sense here, the locking documentation belongs into the aligned/unaligned helpers now. > - ret = xfs_file_dio_aio_write(iocb, from); > + ret = xfs_file_dio_write(iocb, from); If we change this naming it would be nice to throw in a patch to also change the read side as well.