On Mon, May 15, 2023 at 09:20:44AM +0800, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > Blindly expanding the readahead windows will cause unneccessary > pagecache thrashing and also will introdue the network workload. s/introdue/introduce/ > We should disable expanding the windows if the readahead is disabled > and also shouldn't expand the windows too much. > > Expanding forward firstly instead of expanding backward for possible > sequential reads. > > Bound `rreq->len` to the actual file size to restore the previous page > cache usage. > > The posix_fadvise may change the maximum size of a file readahead. > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead") > URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@xxxxxxxxxxxxxxxx > URL: https://www.spinics.net/lists/ceph-users/msg76183.html > Cc: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx> > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > fs/ceph/addr.c | 40 +++++++++++++++++++++++++++++++++------- > 1 file changed, 33 insertions(+), 7 deletions(-) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 93fff1a7373f..4b29777c01d7 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -188,16 +188,42 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq) > struct inode *inode = rreq->inode; > struct ceph_inode_info *ci = ceph_inode(inode); > struct ceph_file_layout *lo = &ci->i_layout; > + unsigned long max_pages = inode->i_sb->s_bdi->ra_pages; > + loff_t end = rreq->start + rreq->len, new_end; > + struct ceph_netfs_request_data *priv = rreq->netfs_priv; > + unsigned long max_len; > u32 blockoff; > - u64 blockno; > > - /* Expand the start downward */ > - blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff); > - rreq->start = blockno * lo->stripe_unit; > - rreq->len += blockoff; > + if (priv) { > + /* Readahead is disabled by posix_fadvise POSIX_FADV_RANDOM */ > + if (priv->file_ra_disabled) > + max_pages = 0; > + else > + max_pages = priv->file_ra_pages; > + > + } > + > + /* Readahead is disabled */ > + if (!max_pages) > + return; > > - /* Now, round up the length to the next block */ > - rreq->len = roundup(rreq->len, lo->stripe_unit); > + max_len = max_pages << PAGE_SHIFT; > + > + /* > + * Try to expand the length forward by rounding up it to the next An extra space between "rounding up". Apart from above two typo, LGTM. Reviewed-by: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx> I also tested this patch with our workload. Reading the first 16k images from ImageNet dataset (1.69GiB) takes about 1.8Gi page cache (as reported by `free -h'). This is expected. For the fadvise use-case, I use `fio' to do the test: $ fio --name=rand --size=32M --fadvise_hint=1 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G after the test, page cache increased by about 35Mi, which is expected. So if appropriate: Tested-by: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx> However, also note random reading to a large file without fadvise still suffers from degradation. e.g., this test: $ fio --name=rand --size=32M --fadvise_hint=0 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G will load nearly every page of the 2Gi test file into page cache, although I only need 32Mi of them. > + * block, but do not exceed the file size, unless the original > + * request already exceeds it. > + */ > + new_end = min(round_up(end, lo->stripe_unit), rreq->i_size); > + if (new_end > end && new_end <= rreq->start + max_len) > + rreq->len = new_end - rreq->start; > + > + /* Try to expand the start downward */ > + div_u64_rem(rreq->start, lo->stripe_unit, &blockoff); > + if (rreq->len + blockoff <= max_len) { > + rreq->start -= blockoff; > + rreq->len += blockoff; > + } > } > > static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq) > -- > 2.40.1 >