On 20.01.25 16:29, Zi Yan wrote:
On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
On 20.01.25 01:39, Zi Yan wrote:
On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
<snip>
However, with this workaround, we can't use transparent huge pages.
Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
is equal to pageblock size. Enabling THP just bumps the pageblock size.
Currently, THP might be mTHP, which can have a significantly smaller
size than 32MB. For
example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
is possible.
Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
without necessarily
using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
pageblock size wouldn't
be necessary?
I think this should work by reducing MAX_PAGE_ORDER like Juan did for
the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
to be changed and kernel needs to be recompiled. Not sure if it is OK
for Juan's use case.
IIRC, we set pageblock size == THP size because this is the granularity
we want to optimize defragmentation for. ("try keep pageblock
granularity of the same memory type: movable vs. unmovable")
Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
(2MB mTHP here) is good enough, reducing pageblock size works.
However, the buddy already supports having different pagetypes for large
allocations.
Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
MIGRATE_MOVABLE can be merged.
Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
So we could leave MAX_ORDER alone and try adjusting the pageblock size
in these setups. pageblock size is already variable on some
architectures IIRC.
Making pageblock size a boot time variable? We might want to warn
sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
Yes, some way to configure it.
We'd only have to check if all of the THP logic can deal with pageblock
size < THP size.
Probably yes, pageblock should be independent of THP logic, although
compaction (used to create THPs) logic is based on pageblock.
Right. As raised in the past, we need a higher level mechanism that
tries to group pageblocks together during comapction/conversion to limit
fragmentation on a higher level.
I assume that many use cases would be fine with not using 32MB/512MB
THPs at all for now -- and instead using 2 MB ones. Of course, for very
large installations it might be different.
This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
This is also good for virtio-mem, since the offline memory block size
can also be reduced. I remember you complained about it before.
Yes, yes, yes! :)
--
Cheers,
David / dhildenb