On Tue, Feb 20, 2024 at 09:49:55PM +0100, Greg Kroah-Hartman wrote: > 6.1-stable review patch. If anyone has any objections, please let me know. Maybe hold off on this one? kernel test robot noticed a -21.4% regression of vm-scalability.throughput on: commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages") https://lore.kernel.org/linux-fsdevel/202402201642.c8d6bbc3-oliver.sang@xxxxxxxxx/ Not a definite no just yet; nobody's dug into it, but some caution seems warranted. > ------------------ > > From: Jan Kara <jack@xxxxxxx> > > commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 upstream. > > ra_alloc_folio() marks a page that should trigger next round of async > readahead. However it rounds up computed index to the order of page being > allocated. This can however lead to multiple consecutive pages being > marked with readahead flag. Consider situation with index == 1, mark == > 1, order == 0. We insert order 0 page at index 1 and mark it. Then we > bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page > at index 2 is marked as well. Then we bump order to 2, index is > incremented to 4, mark gets rounded to 4 so page at index 4 is marked as > well. The fact that multiple pages get marked within a single readahead > window confuses the readahead logic and results in readahead window being > trimmed back to 1. This situation is triggered in particular when maximum > readahead window size is not a power of two (in the observed case it was > 768 KB) and as a result sequential read throughput suffers. > > Fix the problem by rounding 'mark' down instead of up. Because the index > is naturally aligned to 'order', we are guaranteed 'rounded mark' == index > iff 'mark' is within the page we are allocating at 'index' and thus > exactly one page is marked with readahead flag as required by the > readahead code and sequential read performance is restored. > > This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix > readahead with large folios"). The commit changed the rounding with the > rationale: > > "... we were setting the readahead flag on the folio which contains the > last byte read from the block. This is wrong because we will trigger > readahead at the end of the read without waiting to see if a subsequent > read is going to use the pages we just read." > > Although this is true, the fact is this was always the case with read > sizes not aligned to folio boundaries and large folios in the page cache > just make the situation more obvious (and frequent). Also for sequential > read workloads it is better to trigger the readahead earlier rather than > later. It is true that the difference in the rounding and thus earlier > triggering of the readahead can result in reading more for semi-random > workloads. However workloads really suffering from this seem to be rare. > In particular I have verified that the workload described in commit > b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading > random 100k blocks from a file like: > > [reader] > bs=100k > rw=randread > numjobs=1 > size=64g > runtime=60s > > is not impacted by the rounding change and achieves ~70MB/s in both cases. > > [jack@xxxxxxx: fix one more place where mark rounding was done as well] > Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@xxxxxxx > Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@xxxxxxx > Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") > Signed-off-by: Jan Kara <jack@xxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Guo Xuenan <guoxuenan@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > --- > mm/readahead.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -483,7 +483,7 @@ static inline int ra_alloc_folio(struct > > if (!folio) > return -ENOMEM; > - mark = round_up(mark, 1UL << order); > + mark = round_down(mark, 1UL << order); > if (index == mark) > folio_set_readahead(folio); > err = filemap_add_folio(ractl->mapping, folio, index, gfp); > @@ -591,7 +591,7 @@ static void ondemand_readahead(struct re > * It's the expected callback index, assume sequential access. > * Ramp up sizes, and push forward the readahead window. > */ > - expected = round_up(ra->start + ra->size - ra->async_size, > + expected = round_down(ra->start + ra->size - ra->async_size, > 1UL << order); > if (index == expected || index == (ra->start + ra->size)) { > ra->start += ra->size; > >