Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.11.24 09:28, David Hildenbrand wrote:
On 13.11.24 03:16, Yafang Shao wrote:
On Tue, Nov 12, 2024 at 11:19 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

Sorry, but this code is getting quite confusing, especially with such
misleading "large folio" comments.

Even without MADV_HUGEPAGE we will be allocating large folios, as
emphasized by Willy [1]. So the only thing MADV_HUGEPAGE controls is
*which* large folios we allocate. .. as Willy says [2]: "We were only
intending to breach the 'max' for the MADV_HUGE case, not for all cases."

I have no idea how *anybody* should derive from the code here that we
treat MADV_HUGEPAGE in a special way.

Simply completely confusing.

My interpretation of "I don't know if we should try to defend a stupid
sysadmin against the consequences of their misconfiguration like this"
means" would be "drop this patch and don't change anything".

Without this change, large folios won’t function as expected.
Currently, to support MADV_HUGEPAGE, you’d need to set readahead_kb to
2MB, 4MB, or more. However, many applications run without
   > MADV_HUGEPAGE, and a larger readahead_kb might not be optimal for> them.

Someone configured: "Don't readahead more than 128KiB"

And then we complain why we "don't readahead more than 128KiB".

That is just bikeshielding.

It's called "reading the documentation and trying to make sense of a
patch". ;)


So, what’s your suggestion? Simply setting readahead_kb to 2MB? That
would almost certainly cause issues elsewhere.

I'm not 100% sure. I'm trying to make sense of it all.

And I assume there is a relevant difference now between "readahead 2M
using all 4k pages" and "readahead 2M using a single large folio".

I agree that likely readahead using many 4k pages is a worse idea than
just using a single large folio ... if we manage to allocate one. And
it's all not that clear in the code ...

FWIW, I looked at "read_ahead_kb" values on my Fedora40 notebook and
they are all set to 128KiB. I'm not so sure if they really should be
that small ... or if large folio readahead code should just be able to
exceed it.

"mm/filemap: Support VM_HUGEPAGE for file mappings" talks about "even if
we have no history of readahead being successful".

So not about exceeding the configured limit, but exceeding the
"readahead history".

So I consider VM_HUGEPAGE the sign here to "ignore readahead history"
and not to "violate the config".

MADV_HUGEPAGE is definitely a new addition to readahead, and its
behavior isn’t yet defined in the documentation. All we need to do is
clarify its behavior there. The documentation isn’t set in stone—we
can update it as long as it doesn’t disrupt existing applications.

If Willy thinks this is the way to go, then we should document that
MADV_HUGEPAGE may ignore the parameter, agreed.

I still don't understand your one comment:

"It's worth noting that if read_ahead_kb is set to a larger value that
isn't aligned with huge page sizes (e.g., 4MB + 128KB), it may still
fail to map to hugepages."

I just played with your patch and your reproducer.

Setting read_ahead_kb to 0/128/4096 will give us PMDs with MADV_HUGEPAGE. So we're ignoring the parameter.

But with 4224 we are not ignoring the parameter with MADV_HUGEPAGE. :(

root@localhost:~#  cat /proc/`pgrep readhahead`/smaps_rollup
00400000-7ffe3a206000 ---p 00000000 00:00 0 [rollup]
Rss:             1049728 kB
Pss:             1048700 kB
Pss_Dirty:        968576 kB
Pss_Anon:             84 kB
Pss_File:        1048616 kB
Pss_Shmem:             0 kB
Shared_Clean:       1056 kB
Shared_Dirty:          0 kB
Private_Clean:     80096 kB
Private_Dirty:    968576 kB
Referenced:      1049728 kB
Anonymous:            84 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:    135168 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux