Re: [PATCH] test race when checking i_size on direct i/o read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 20, 2017 at 07:05:04PM +0800, Eryu Guan wrote:
> On Tue, Sep 19, 2017 at 07:34:06AM -0700, Christoph Hellwig wrote:
> > On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote:
> > > Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it
> > > update the incore i_size after unwritten extent conversion? Then move
> > > (or remove) the associated update from xfs_dio_write_end_io().
> > 
> > I don't think we even need a flag - all three callers of
> > xfs_iomap_write_unwritten want to update the file size.
> 
> I tried this approach, but seems there's some problem in the buffered
> aio path, generic/112 (aio fsx) failed quickly. But I haven't digged
> into the reason (maybe I screwed it up, not the method is wrong..).
> 

>From the generic/112 results:

Size error: expected 0x3785b stat 0x38000 seek 0x38000

I suspect the problem is that the offset+size from buffered I/O
completion is not based on the inode size. Rather, it is buffer head
granularity size of the ioend. Given that, it probably does make sense
to skip the update from this path.

> Then I tried Brian's suggestion, pass a boolean to
> xfs_iomap_write_unwritten() to tell if we want it to update in-core
> isize after unwritten extent conversion, and skip the in-core isize
> update in xfs_dio_write_end_io() accordingly. This approach seems to
> work, it passed the test Eric posted here, and fstests 'aio' group
> tests, a run of 'quick' group didn't find any new failure as well.
> 
> I attached the WIP patch (without proper comments) I was testing, if
> this looks fine I can format a formal patch and do more testings.
>

Thanks. This mostly looks reasonable to me...
 
> Thanks,
> Eryu
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 29172609f2a3..288da47e9ac5 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -343,7 +343,7 @@ xfs_end_io(
>  		error = xfs_reflink_end_cow(ip, offset, size);
>  		break;
>  	case XFS_IO_UNWRITTEN:
> -		error = xfs_iomap_write_unwritten(ip, offset, size);
> +		error = xfs_iomap_write_unwritten(ip, offset, size, false);

Maybe add a single line comment here wrt to the above (why we don't
update isize).

>  		break;
>  	default:
>  		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans);
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 350b6d43ba23..f3ad024573e7 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -435,6 +435,7 @@ xfs_dio_write_end_io(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	loff_t			offset = iocb->ki_pos;
>  	bool			update_size = false;
> +	bool			write_unwritten = (flags & IOMAP_DIO_UNWRITTEN);
>  	int			error = 0;
>  
>  	trace_xfs_end_io_direct_write(ip, offset, size);
> @@ -458,7 +459,8 @@ xfs_dio_write_end_io(
>  	 */
>  	spin_lock(&ip->i_flags_lock);
>  	if (offset + size > i_size_read(inode)) {
> -		i_size_write(inode, offset + size);
> +		if (!write_unwritten)
> +			i_size_write(inode, offset + size);
>  		update_size = true;

I find the logic a little confusing here. For !write_unwritten,
update_size means to update the on-disk size. Otherwise, it instructs
iomap_write_unwritten() to also update the in-core size. The latter also
checks for appends, however, so it might as well always be true from
here. It seems that for such a small function, we should be able to make
this a bit easier to follow. ;P

Should we ever see an isize update at all on a IOMAP_DIO_COW completion
(wouldn't reflinked blocks have to be within eof)? If not, then we can
presumably rule out isize updates in that case. I think that just leaves
the case where a dio write occurs on a pre-existing block. Hmm, could we
just move this whole hunk down to after the iomap_write_unwritten() call
and eliminate the need for the flag entirely? E.g., something like:

	...
	if (IOMAP_DIO_COW) {
		...
	}

	/* unwritten conversion updates isize */
	if (IOMAP_DIO_UNWRITTEN)
		return xfs_iomap_write_unwritten(ip, offset, size, true);

	if (offset + size > i_size_read(inode)) {
		i_size_write(inode, offset + size);
		error = xfs_setfilesize(...);
	}

	return error;

>  	}
>  	spin_unlock(&ip->i_flags_lock);
> @@ -469,8 +471,8 @@ xfs_dio_write_end_io(
>  			return error;
>  	}
>  
> -	if (flags & IOMAP_DIO_UNWRITTEN)
> -		error = xfs_iomap_write_unwritten(ip, offset, size);
> +	if (write_unwritten)
> +		error = xfs_iomap_write_unwritten(ip, offset, size, update_size);
>  	else if (update_size)
>  		error = xfs_setfilesize(ip, offset, size);
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a1909bc064e9..0a088586371e 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -829,7 +829,8 @@ int
>  xfs_iomap_write_unwritten(
>  	xfs_inode_t	*ip,
>  	xfs_off_t	offset,
> -	xfs_off_t	count)
> +	xfs_off_t	count,
> +	bool		update_size)
>  {
>  	xfs_mount_t	*mp = ip->i_mount;
>  	xfs_fileoff_t	offset_fsb;
> @@ -840,6 +841,7 @@ xfs_iomap_write_unwritten(
>  	xfs_trans_t	*tp;
>  	xfs_bmbt_irec_t imap;
>  	struct xfs_defer_ops dfops;
> +	struct inode	*inode = VFS_I(ip);
>  	xfs_fsize_t	i_size;
>  	uint		resblks;
>  	int		error;
> @@ -900,6 +902,13 @@ xfs_iomap_write_unwritten(
>  		if (i_size > offset + count)
>  			i_size = offset + count;
>  
> +		if (update_size) {
> +			spin_lock(&ip->i_flags_lock);
> +			if (i_size > i_size_read(inode))
> +				i_size_write(inode, i_size);
> +			spin_unlock(&ip->i_flags_lock);

We have XFS_ILOCK_EXCL here so I don't think the spinlocks are
necessary any longer. That means this could probably be condensed to
something like

		if (update_size && i_size > i_size_read(inode))
			i_size_write(inode, i_size)

> +		}
> +
>  		i_size = xfs_new_eof(ip, i_size);
>  		if (i_size) {
>  			ip->i_d.di_size = i_size;
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 00db3ecea084..ee535065c5d0 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -27,7 +27,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t,
>  			struct xfs_bmbt_irec *, int);
>  int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t,
>  			struct xfs_bmbt_irec *);
> -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t);
> +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
>  
>  void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
>  		struct xfs_bmbt_irec *);
> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
> index 2f2dc3c09ad0..4246876df7b7 100644
> --- a/fs/xfs/xfs_pnfs.c
> +++ b/fs/xfs/xfs_pnfs.c
> @@ -274,7 +274,7 @@ xfs_fs_commit_blocks(
>  					(end - 1) >> PAGE_SHIFT);
>  		WARN_ON_ONCE(error);
>  
> -		error = xfs_iomap_write_unwritten(ip, start, length);
> +		error = xfs_iomap_write_unwritten(ip, start, length, false);

Note that this path does update isize. It runs another transaction to
get around the fact that write_unwritten() wouldn't log a new on disk
size. There is some validation thing here though, so we might need to
check whether it is Ok to run that earlier and whether it should
continue to return an error on failure. Christoph?

That said, maybe a follow on patch would be better since this one might
be stable fodder.

Brian

>  		if (error)
>  			goto out_drop_iolock;
>  	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux