Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 12, 2024 at 11:19 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> >> Sorry, but this code is getting quite confusing, especially with such
> >> misleading "large folio" comments.
> >>
> >> Even without MADV_HUGEPAGE we will be allocating large folios, as
> >> emphasized by Willy [1]. So the only thing MADV_HUGEPAGE controls is
> >> *which* large folios we allocate. .. as Willy says [2]: "We were only
> >> intending to breach the 'max' for the MADV_HUGE case, not for all cases."
> >>
> >> I have no idea how *anybody* should derive from the code here that we
> >> treat MADV_HUGEPAGE in a special way.
> >>
> >> Simply completely confusing.
> >>
> >> My interpretation of "I don't know if we should try to defend a stupid
> >> sysadmin against the consequences of their misconfiguration like this"
> >> means" would be "drop this patch and don't change anything".
> >
> > Without this change, large folios won’t function as expected.
> > Currently, to support MADV_HUGEPAGE, you’d need to set readahead_kb to
> > 2MB, 4MB, or more. However, many applications run without
>  > MADV_HUGEPAGE, and a larger readahead_kb might not be optimal for> them.
>
> Someone configured: "Don't readahead more than 128KiB"
>
> And then we complain why we "don't readahead more than 128KiB".

That is just bikeshielding.

So, what’s your suggestion? Simply setting readahead_kb to 2MB? That
would almost certainly cause issues elsewhere.

>
> :)
>
> "mm/filemap: Support VM_HUGEPAGE for file mappings" talks about "even if
> we have no history of readahead being successful".
>
> So not about exceeding the configured limit, but exceeding the
> "readahead history".
>
> So I consider VM_HUGEPAGE the sign here to "ignore readahead history"
> and not to "violate the config".

MADV_HUGEPAGE is definitely a new addition to readahead, and its
behavior isn’t yet defined in the documentation. All we need to do is
clarify its behavior there. The documentation isn’t set in stone—we
can update it as long as it doesn’t disrupt existing applications.

>
> But that's just my opinion.
>
> >
> >>
> >> No changes to API, no confusing code.
> >
> > New features like large folios can often create confusion with
> > existing rules or APIs, correct?
>
> We should not try making it even more confusing, if possible.

A quick tip for you: the readahead size already exceeds readahead_kb
even without MADV_HUGEPAGE. You might want to spend some time tracing
that behavior.

In summary, it’s really the readahead code itself that’s causing the
confusion—not MADV_HUGEPAGE.

>
> >
> >>
> >> Maybe pr_info_once() when someone uses MADV_HUGEPAGE with such backends
> >> to tell the sysadmin that something stupid is happening ...
> >
> > It's not a flawed setup; it's just that this new feature doesn’t work
> > well with the existing settings, and updating those settings to
> > accommodate it isn't always feasible.
> I don't agree. But it really is Willy's take.
>
> The code, as it stands is confusing and nobody will be able to figure
> out how MADV_HUGEPAGE comes into play here and why we suddenly exceed
> "max/config" simply because "cur" is larger than max.
>
> For example, in the code
>
> ra->size = start - index;       /* old async_size */
> ra->size += req_count;
> ra->size = get_next_ra_size(ra, max_pages);
>
> What happens if ra->size was at max, then we add "req_count" and
> suddenly we exceed "max" and say "well, sure that's fine now". Even if
> MADV_HUGEPAGE was never involved? Maybe it cannot happen, but it sure is
> confusing.
>
>
> Not to mention that "It's worth noting that if read_ahead_kb is set to a
> larger value that isn't aligned with huge page sizes (e.g., 4MB +
> 128KB), it may still fail to map to hugepages." sounds very odd :(

This will be beneficial if you're open to using MADV_HUGEPAGE in a
production environment.

-- 
Regards
Yafang





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux