Re: [PATCH 1/2] cephfs: ignore error from invalidate_inode_pages2_range() in direct write.

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 31 Aug 2016 09:47:37 -0400

On Wed, 2016-08-31 at 12:58 +1000, NeilBrown wrote:
> This call can fail if there are dirty pages.  The preceding call to
> filemap_write_and_wait_range() will normally remove dirty pages, but
> as inode_lock() is not held over calls to ceph_direct_read_write(), it
> could race with non-direct writes and pages could be dirtied
> immediately after filemap_write_and_wait_range() returns
> 
> If there are dirty pages, they will be removed by the subsequent call
> to truncate_inode_pages_range(), so having them here is not a problem.
> 
> If the 'ret' value is left holding an error, then in the async IO case
> (aio_req is not NULL) the loop that would normally call
> ceph_osdc_start_request() will see the error in 'ret' and abort all
> requests.  This doesn't seem like correct behaviour.
> 
> So clear 'ret' and ignore the error (other than the dout() message).
> 
> > Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
>  fs/ceph/file.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 0f5375d8e030..1ca6e29edcc9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -905,8 +905,14 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
> >  		ret = invalidate_inode_pages2_range(inode->i_mapping,
> >  					pos >> PAGE_SHIFT,
> >  					(pos + count) >> PAGE_SHIFT);
> > -		if (ret < 0)
> > +		if (ret < 0) {
> >  			dout("invalidate_inode_pages2_range returned %d\n", ret);
> > +			/*
> > +			 * Error is not fatal as we truncate_inode_pages_range()
> > +			 * below.
> > +			 */
> > +			ret = 0;
> > +		}
>  
> >  		flags = CEPH_OSD_FLAG_ORDERSNAP |
> >  			CEPH_OSD_FLAG_ONDISK |

Good catch. Even better might be to just declare a int ret2 and not
clobber "ret" at all.

Clearly, mixing buffered and direct I/O is gross, but I suppose you
could hit the occasional problem here with a real workload
occasionally.

Should this go to stable? The patch seems safe enough.

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html