Re: [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 26 Sep 2018 11:17:24 -0400

On Mon, Sep 17, 2018 at 10:53:51PM +0200, Christoph Hellwig wrote:
> For files with extent size hints we always allocate unwritten extents
> first and then convert them to real extents later to avoid exposing
> stale data outside the blocks actually written.  But there is no
> reason we can't do unwritten extent allocations from delalloc blocks,
> in fact we already do that for COW operations.
> 
> Note that this does not handle files on the RT subvolume (yet), as
> we removed code handling RT allocations from the delalloc path this
> would be a slightly bigger project, and it doesn't really help with
> simplifying the COW path either.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
>  fs/xfs/xfs_iomap.c | 17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e12ff5e9a8ec..2d08ace09e25 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -533,7 +533,6 @@ xfs_file_iomap_begin_delay(
>  	xfs_fsblock_t		prealloc_blocks = 0;
>  
>  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> -	ASSERT(!xfs_get_extsz_hint(ip));
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  
> @@ -1035,7 +1034,7 @@ xfs_file_iomap_begin(
>  		return -EIO;
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
> -			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> +			!IS_DAX(inode) && !XFS_IS_REALTIME_INODE(ip)) {

AFAICT this fails to honor the extent size hint for buffered writes:

# xfs_io -fc "truncate 1m" -c "extsize 32k" -c "pwrite 0 4k" -c "fiemap -v" /mnt/file
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0001 sec (35.511 MiB/sec and 9090.9091 ops/sec)
/mnt/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          80..87               8   0x1
   1: [8..2047]:       hole              2040

... and with an unpatched kernel:

# xfs_io -fc "truncate 1m" -c "extsize 32k" -c "pwrite 0 4k" -c "fiemap -v" /mnt/file                                                                                                                                       
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0017 sec (2.283 MiB/sec and 584.4535 ops/sec)
/mnt/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          192..199             8   0x0
   1: [8..63]:         200..255            56 0x801
   2: [64..2047]:      hole              1984

I'd guess this is due to the extent overlap check in
xfs_bmap_extsize_align(), but I haven't verified.

Brian

>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
>  				iomap);
> @@ -1088,17 +1087,9 @@ xfs_file_iomap_begin(
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_reflink_inode(ip)) {
> -		if (flags & IOMAP_DIRECT) {
> -			/* may drop and re-acquire the ilock */
> -			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
> -					&lockmode);
> -			if (error)
> -				goto out_unlock;
> -		} else {
> -			error = xfs_reflink_reserve_cow(ip, &imap, &shared);
> -			if (error)
> -				goto out_unlock;
> -		}
> +		error = xfs_reflink_allocate_cow(ip, &imap, &shared, &lockmode);
> +		if (error)
> +			goto out_unlock;
>  
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> -- 
> 2.18.0
>