On Thu, Nov 7, 2024 at 12:52 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, Nov 06, 2024 at 05:21:14PM +0800, Yafang Shao wrote: > > When testing large folio support with XFS on our servers, we observed that > > only a few large folios are mapped when reading large files via mmap. > > After a thorough analysis, I identified it was caused by the > > `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this > > parameter is set to 128KB. After I tune it to 2MB, the large folio can > > work as expected. However, I believe the large folio behavior should not be > > dependent on the value of read_ahead_kb. It would be more robust if the > > kernel can automatically adopt to it. > > > > With `/sys/block/*/queue/read_ahead_kb` set to a non-2MB aligned size, > > this issue can be verified with a simple test case, as shown below: > > I don't like to see these programs in commit messages. If it's a > valuable program, it should go into tools/testing. If not, it shouldn't > be in the commit message. I just want to make it easy for others to verify this change. I'm okay with dropping this program. > > > When large folio support is enabled and read_ahead_kb is set to a smaller > > value, ra->size (4MB) may exceed the maximum allowed size (e.g., 128KB). To > > address this, we need to add a conditional check for such cases. However, > > this alone is insufficient, as users might set read_ahead_kb to a larger, > > non-hugepage-aligned value (e.g., 4MB + 128KB). In these instances, it is > > essential to explicitly align ra->size with the hugepage size. > > I wish you'd discussed this in the earlier thread instead of just > smashing it into this patch. Because your solution is wrong. > > > @@ -642,7 +644,7 @@ void page_cache_async_ra(struct readahead_control *ractl, > > 1UL << order); > > if (index == expected) { > > ra->start += ra->size; > > - ra->size = get_next_ra_size(ra, max_pages); > > + ra->size = ALIGN(get_next_ra_size(ra, max_pages), 1 << order); > > Let's suppose that someone sets read_ahead_kb to 192kB. If the previous > readahead did 128kB, we now try to align that to 128kB, so we'll readahead > 256kB which is larger than max. We were only intending to breach the > 'max' for the MADV_HUGE case, not for all cases. In the non-MADV_HUGE case, the order can only be 0, correct? > > Honestly, I don't know if we should try to defend a stupid sysadmin > against the consequences of their misconfiguration like this. I'd be in > favour of getting rid of the configuration knob entirely (or just ignoring > what the sysadmin set it to), but if we do that, we need to replace it > with something that can automatically figure out what the correct setting > for the readahead_max_kb should be (which is probably a function of the > bandwidth, latency and seek time of the underlying device). > > But that's obviously not part of this patch. I'd be in favour of just > dropping this ALIGN and leaving just the first hunk of the patch. I’m okay with removing this ALIGN, since we won’t be setting a large read_ahead_kb value in our production environment. ;) -- Regards Yafang