Re: [PATCH v4] ceph: fix blindly expanding the readahead windows

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 09 May 2023 07:18:17 -0700

On Tue, 2023-05-09 at 08:57 +0800, xiubli@xxxxxxxxxx wrote:
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> 
> Blindly expanding the readahead windows will cause unneccessary
> pagecache thrashing and also will introdue the network workload.
> We should disable expanding the windows if the readahead is disabled
> and also shouldn't expand the windows too much.
> 
> Expanding forward firstly instead of expanding backward for possible
> sequential reads.
> 
> Bound `rreq->len` to the actual file size to restore the previous page
> cache usage.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
> URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@xxxxxxxxxxxxxxxx
> URL: https://www.spinics.net/lists/ceph-users/msg76183.html
> Cc: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx>
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
> 
> V4:
> - two small cleanup from Ilya's comments. Thanks
> 
> 

(cc'ing Steve French since he was asking me about ceph readahead
yesterday)

FWIW, the original idea here was to try to read whole OSD objects when
we can. I can see that that may have been overzealous though, so
ramping up the size more slowly makes sense.

>  fs/ceph/addr.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index ca4dc6450887..683ba9fbd590 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -188,16 +188,30 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq)
>  	struct inode *inode = rreq->inode;
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_file_layout *lo = &ci->i_layout;
> +	unsigned long max_pages = inode->i_sb->s_bdi->ra_pages;
> +	unsigned long max_len = max_pages << PAGE_SHIFT;
> +	loff_t end = rreq->start + rreq->len, new_end;
>  	u32 blockoff;
> -	u64 blockno;
>  
> -	/* Expand the start downward */
> -	blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
> -	rreq->start = blockno * lo->stripe_unit;
> -	rreq->len += blockoff;
> +	/* Readahead is disabled */
> +	if (!max_pages)
> +		return;
>  
> -	/* Now, round up the length to the next block */
> -	rreq->len = roundup(rreq->len, lo->stripe_unit);
> +	/*
> +	 * Try to expand the length forward by rounding  up it to the next
> +	 * block, but do not exceed the file size, unless the original
> +	 * request already exceeds it.
> +	 */

Do you really need to clamp this to the i_size? Is it ever possible for
the client to be out of date as to the real file size? If so then you
might end up with a short read when there is actually more data.

I guess by doing this, you're trying to avoid having to call the OSD
back after a short read and get a zero-length read when the file is
shorter than the requested read?

> +	new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
> +	if (new_end > end && new_end <= rreq->start + max_len)
> +		rreq->len = new_end - rreq->start;
> +
> +	/* Try to expand the start downward */
> +	div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
> +	if (rreq->len + blockoff <= max_len) {
> +		rreq->start -= blockoff;
> +		rreq->len += blockoff;
> +	}
>  }
>  
>  static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq)

-- 
Jeff Layton <jlayton@xxxxxxxxxx>