On Tue, Jan 21, 2025 at 6:24 PM Zi Yan <ziy@xxxxxxxxxx> wrote: > > On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote: > > On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > > > On 20.01.25 16:29, Zi Yan wrote: > > > > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote: > > > >> On 20.01.25 01:39, Zi Yan wrote: > > > >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote: > > > >>> <snip> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> However, with this workaround, we can't use transparent huge pages. > > > >>>>>>>> > > > >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages? > > > >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which > > > >>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size. > > > >>>> > > > > Thanks, I can see the initialization in include/linux/pageblock-flags.h > > > > #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER) > > > > > >>>> Currently, THP might be mTHP, which can have a significantly smaller > > > >>>> size than 32MB. For > > > >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP > > > >>>> is possible. > > > >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration. > > > >>>> > > > >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE > > > >>>> without necessarily > > > >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large > > > >>>> pageblock size wouldn't > > > >>>> be necessary? > > > > Do you mean with mTHP? We haven't explored that option. > > Yes. Unless your applications have special demands for PMD THPs. 2MB > mTHP should work. > > > > > > >>> > > > >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for > > > >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs > > > >>> to be changed and kernel needs to be recompiled. Not sure if it is OK > > > >>> for Juan's use case. > > > >> > > > > The main goal is to reserve only the necessary CMA memory for the > > drivers, which is > > usually the same for 4kb and 16kb page size kernels. > > Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the > minimal CMA alignment size. Can you deploy that kernel to production? We can't deploy that because many Android partners are using PMD THP instead of mTHP. > If yes, you can use mTHP instead of PMD THP and still get the CMA > alignemnt you want. > > > > > > >> > > > >> IIRC, we set pageblock size == THP size because this is the granularity > > > >> we want to optimize defragmentation for. ("try keep pageblock > > > >> granularity of the same memory type: movable vs. unmovable") > > > > > > > > Right. In past, it is optimized for PMD THP. Now we have mTHP. If user > > > > does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP > > > > (2MB mTHP here) is good enough, reducing pageblock size works. > > > > > > > >> > > > >> However, the buddy already supports having different pagetypes for large > > > >> allocations. > > > > > > > > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and > > > > MIGRATE_MOVABLE can be merged. > > > > > > Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine. > > > > > > > > > > >> > > > >> So we could leave MAX_ORDER alone and try adjusting the pageblock size > > > >> in these setups. pageblock size is already variable on some > > > >> architectures IIRC. > > > > > > > > Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the > > 16KiB page size kernel, > > I tried these 2 configurations: > > > > #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES) > > > > and > > > > #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES) > > > > with both of them, the kernel failed to boot. > > CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES. > So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock > size. pageblock size is determined by pageblock order, which is > affected by MAX_PAGE_ORDER. > > > > > > > Making pageblock size a boot time variable? We might want to warn > > > > sysadmin/user that >pageblock_order THP/mTHP creation will suffer. > > > > > > Yes, some way to configure it. > > > > > > > > > > >> > > > >> We'd only have to check if all of the THP logic can deal with pageblock > > > >> size < THP size. > > > > > > > > The reason that THP was disabled in my experiment is because this > > assertion failed > > > > mm/huge_memory.c > > /* > > * hugepages can't be allocated by the buddy allocator > > */ > > MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER); > > > > when > > > > config ARCH_FORCE_MAX_ORDER > > int > > ..... > > default "8" if ARM64_16K_PAGES > > > > You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works. > We'll do that and post the results. > > > > > > Probably yes, pageblock should be independent of THP logic, although > > > > compaction (used to create THPs) logic is based on pageblock. > > > > > > Right. As raised in the past, we need a higher level mechanism that > > > tries to group pageblocks together during comapction/conversion to limit > > > fragmentation on a higher level. > > > > > > I assume that many use cases would be fine with not using 32MB/512MB > > > THPs at all for now -- and instead using 2 MB ones. Of course, for very > > > large installations it might be different. > > > > > > >> > > > >> This issue is even more severe on arm64 with 64k (pageblock = 512MiB). > > > > > > > > I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get: > > > > PAGE_SIZE | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES > > 4KiB | 15 | 4KiB > > * 32KiB = 128MiB > > 16KiB | 13 | 16KiB > > * 8KiB = 128MiB > > 64KiB | 13 | 64KiB > > * 8KiB = 512MiB > > > > > > This is also good for virtio-mem, since the offline memory block size > > > > can also be reduced. I remember you complained about it before. > > > > > > Yes, yes, yes! :) > > > > > David's proposal should work in general, but will might take non-trivial > amount of work: > > 1. keep pageblock size always at 4MB for all arch. > 2. adjust existing pageblock users, like compaction, to work on a > different range, independent of pageblock. > a. for anti-fragmentation mechanism, multiple pageblocks might have > different migratetypes but would be compacted to generate huge > pages, but how to align their migratetypes is TBD. > 3. other corner case handlings. > > > The final question is that Barry mentioned that over-reserved CMA areas > can be used for movable page allocations. Why does it not work for you? I need to run more experiments to see what type of page allocations in the system is the dominant one (unmovable or movable). If it is movable, over-reserved CMA areas should be fine. > > -- > Best Regards, > Yan, Zi >