Re: [PATCH v6 2/2] ceph: fix blindly expanding the readahead windows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 15, 2023 at 09:20:44AM +0800, xiubli@xxxxxxxxxx wrote:
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> 
> Blindly expanding the readahead windows will cause unneccessary
> pagecache thrashing and also will introdue the network workload.

s/introdue/introduce/

> We should disable expanding the windows if the readahead is disabled
> and also shouldn't expand the windows too much.
> 
> Expanding forward firstly instead of expanding backward for possible
> sequential reads.
> 
> Bound `rreq->len` to the actual file size to restore the previous page
> cache usage.
> 
> The posix_fadvise may change the maximum size of a file readahead.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
> URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@xxxxxxxxxxxxxxxx
> URL: https://www.spinics.net/lists/ceph-users/msg76183.html
> Cc: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx>
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
>  fs/ceph/addr.c | 40 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 93fff1a7373f..4b29777c01d7 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -188,16 +188,42 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq)
>  	struct inode *inode = rreq->inode;
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_file_layout *lo = &ci->i_layout;
> +	unsigned long max_pages = inode->i_sb->s_bdi->ra_pages;
> +	loff_t end = rreq->start + rreq->len, new_end;
> +	struct ceph_netfs_request_data *priv = rreq->netfs_priv;
> +	unsigned long max_len;
>  	u32 blockoff;
> -	u64 blockno;
>  
> -	/* Expand the start downward */
> -	blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
> -	rreq->start = blockno * lo->stripe_unit;
> -	rreq->len += blockoff;
> +	if (priv) {
> +		/* Readahead is disabled by posix_fadvise POSIX_FADV_RANDOM */
> +		if (priv->file_ra_disabled)
> +			max_pages = 0;
> +		else
> +			max_pages = priv->file_ra_pages;
> +
> +	}
> +
> +	/* Readahead is disabled */
> +	if (!max_pages)
> +		return;
>  
> -	/* Now, round up the length to the next block */
> -	rreq->len = roundup(rreq->len, lo->stripe_unit);
> +	max_len = max_pages << PAGE_SHIFT;
> +
> +	/*
> +	 * Try to expand the length forward by rounding  up it to the next

An extra space between "rounding  up".

Apart from above two typo, LGTM.

Reviewed-by: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx>

I also tested this patch with our workload. Reading the first 16k images
from ImageNet dataset (1.69GiB) takes about 1.8Gi page cache (as
reported by `free -h'). This is expected.

For the fadvise use-case, I use `fio' to do the test:
$ fio --name=rand --size=32M --fadvise_hint=1 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G

after the test, page cache increased by about 35Mi, which is expected.
So if appropriate:

Tested-by: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx>

However, also note random reading to a large file without fadvise still
suffers from degradation. e.g., this test:
$ fio --name=rand --size=32M --fadvise_hint=0 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G

will load nearly every page of the 2Gi test file into page cache,
although I only need 32Mi of them.

> +	 * block, but do not exceed the file size, unless the original
> +	 * request already exceeds it.
> +	 */
> +	new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
> +	if (new_end > end && new_end <= rreq->start + max_len)
> +		rreq->len = new_end - rreq->start;
> +
> +	/* Try to expand the start downward */
> +	div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
> +	if (rreq->len + blockoff <= max_len) {
> +		rreq->start -= blockoff;
> +		rreq->len += blockoff;
> +	}
>  }
>  
>  static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq)
> -- 
> 2.40.1
> 



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux