Re: [PATCH 5/6] xfs: split unaligned DIO write code out

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Tue, 12 Jan 2021 11:37:45 +0100

On Tue, Jan 12, 2021 at 12:07:45PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> The unaligned DIO write path is more convulted than the normal path,

s/convulted/convoluted/

>
> and we are about to make it more complex. Keep the block aligned
> fast path dio write code trim and simple by splitting out the
> unaligned DIO code from it.

I like this, but a few comments below:

>  /*
> + * Handle block aligned direct IO writes
>   *
>   * Lock the inode appropriately to prepare for and issue a direct IO write.
>   * By separating it from the buffered write path we remove all the tricky to
> @@ -518,35 +518,88 @@ static const struct iomap_dio_ops xfs_dio_write_ops = {
>   * until we're sure the bytes at the new EOF have been zeroed and/or the cached
>   * pages are flushed out.
>   *
> + * Returns with locks held indicated by @iolock and errors indicated by
> + * negative return values.

As far as I can tell no locks are held when returning from
xfs_file_dio_write_aligned.

> + */
> +STATIC ssize_t
> +xfs_file_dio_write_aligned(

I thought we got rid of STATIC for new code?

> +	/*
> +	 * No fallback to buffered IO after short writes for XFS, direct I/O
> +	 * will either complete fully or return an error.
> +	 */
> +	ASSERT(ret < 0 || ret == count);

Maybe it is time to drop this assert rather than duplicating it given that
iomap direct I/O has behaved sane in this regard from the start?

> + * To provide the same serialisation for AIO, we also need to wait for
>   * outstanding IOs to complete so that unwritten extent conversion is completed
>   * before we try to map the overlapping block. This is currently implemented by
>   * hitting it with a big hammer (i.e. inode_dio_wait()).
>   *
> + * This means that unaligned dio writes alwys block. There is no "nowait" fast

s/alwys/always/

> +	/*
> +	 * We can't properly handle unaligned direct I/O to reflink
> +	 * files yet, as we can't unshare a partial block.
> +	 */

FYI, this could use up the whole 80 chars.

> +	/*
> +	 * Don't take the exclusive iolock here unless the I/O is unaligned to
> +	 * the file system block size.  We don't need to consider the EOF
> +	 * extension case here because xfs_file_aio_write_checks() will relock
> +	 * the inode as necessary for EOF zeroing cases and fill out the new
> +	 * inode size as appropriate.
> +	 */
> +	if ((iocb->ki_pos | count) & mp->m_blockmask)
> +		return xfs_file_dio_write_unaligned(ip, iocb, from);
> +	return xfs_file_dio_write_aligned(ip, iocb, from);

I don't think that whole comment makes much sense here, the locking
documentation belongs into the aligned/unaligned helpers now.

> -		ret = xfs_file_dio_aio_write(iocb, from);
> +		ret = xfs_file_dio_write(iocb, from);

If we change this naming it would be nice to throw in a patch to
also change the read side as well.