On Mon, Nov 11, 2024 at 11:26 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 11.11.24 16:05, David Hildenbrand wrote: > > On 11.11.24 15:28, Yafang Shao wrote: > >> On Mon, Nov 11, 2024 at 6:33 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>> > >>> On 08.11.24 15:17, Yafang Shao wrote: > >>>> When testing large folio support with XFS on our servers, we observed that > >>>> only a few large folios are mapped when reading large files via mmap. > >>>> After a thorough analysis, I identified it was caused by the > >>>> `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this > >>>> parameter is set to 128KB. After I tune it to 2MB, the large folio can > >>>> work as expected. However, I believe the large folio behavior should not be > >>>> dependent on the value of read_ahead_kb. It would be more robust if the > >>>> kernel can automatically adopt to it. > >>> > >>> Now I am extremely confused. > >>> > >>> Documentation/ABI/stable/sysfs-block: > >>> > >>> "[RW] Maximum number of kilobytes to read-ahead for filesystems on this > >>> block device." > >>> > >>> > >>> So, with your patch, will we also be changing the readahead size to > >>> exceed that, or simply allocate larger folios and not exceeding the > >>> readahead size (e.g., leaving them partially non-filled)? > >> > >> Exceeding the readahead size for the MADV_HUGEPAGE case is > >> straightforward; this is what the current patch accomplishes. > >> > > > > Okay, so this only applies with MADV_HUGEPAGE I assume. Likely we should > > also make that clearer in the subject. > > > > mm/readahead: allow exceeding configured read_ahead_kb with MADV_HUGEPAGE > > > > > > If this is really a fix, especially one that deserves CC-stable, I > > cannot tell. Willy is the obvious expert :) > > > >>> > >>> If you're also changing the readahead behavior to exceed the > >>> configuration parameter it would sound to me like "I am pushing the > >>> brake pedal and my care brakes; fix the brakes to adopt whether to brake > >>> automatically" :) > >>> > >>> Likely I am missing something here, and how the read_ahead_kb parameter > >>> is used after your patch. > >> > >> The read_ahead_kb parameter continues to function for > >> non-MADV_HUGEPAGE scenarios, whereas special handling is required for > >> the MADV_HUGEPAGE case. It appears that we ought to update the > >> Documentation/ABI/stable/sysfs-block to reflect the changes related to > >> large folios, correct? > > > > Yes, how it related to MADV_HUGEPAGE. I would assume that it would get > > ignored, but ... > > > > ... staring at get_next_ra_size(), it's not quite ignored, because we > > still us it as a baseline to detect how much we want to bump up the > > limit when the requested size is small? (*2 vs *4 etc) :/ > > > > So the semantics are really starting to get weird, unless I am missing > > something important. > Likely what I am missing is that the value of get_next_ra_size() will never be relevant > in that case. I assume the following would end up doing the same: > > iff --git a/mm/readahead.c b/mm/readahead.c > index 475d2940a1edb..cc7f883f83d86 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -668,7 +668,12 @@ void page_cache_async_ra(struct readahead_control *ractl, > ra->start = start; > ra->size = start - index; /* old async_size */ > ra->size += req_count; > - ra->size = get_next_ra_size(ra, max_pages); > + /* > + * Allow the actual size to exceed the readahead window for > + * MADV_HUGEPAGE. > + */ > + if (ra->size < max_pages) > + ra->size = get_next_ra_size(ra, max_pages); This change doesn’t apply to MADV_HUGEPAGE because, in cases where `ra->size > max_pages`, ra->size has already been set to max_pages. This can be easily verified with the example provided in the previous version[1]. [1]. https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.shao@xxxxxxxxx/ > ra->async_size = ra->size; > readit: > ractl->_index = ra->start; > > > So maybe it should just be in get_next_ra_size() where we clarify what "max_pages" > means and why we simply decide to ignore the value ... -- Regards Yafang