On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 20.01.25 16:29, Zi Yan wrote: > > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote: > >> On 20.01.25 01:39, Zi Yan wrote: > >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote: > >>> <snip> > >>>>>>>> > >>>>>>>> > >>>>>>>> However, with this workaround, we can't use transparent huge pages. > >>>>>>>> > >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages? > >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which > >>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size. > >>>> Thanks, I can see the initialization in include/linux/pageblock-flags.h #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER) > >>>> Currently, THP might be mTHP, which can have a significantly smaller > >>>> size than 32MB. For > >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP > >>>> is possible. > >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration. > >>>> > >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE > >>>> without necessarily > >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large > >>>> pageblock size wouldn't > >>>> be necessary? Do you mean with mTHP? We haven't explored that option. > >>> > >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for > >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs > >>> to be changed and kernel needs to be recompiled. Not sure if it is OK > >>> for Juan's use case. > >> The main goal is to reserve only the necessary CMA memory for the drivers, which is usually the same for 4kb and 16kb page size kernels. > >> > >> IIRC, we set pageblock size == THP size because this is the granularity > >> we want to optimize defragmentation for. ("try keep pageblock > >> granularity of the same memory type: movable vs. unmovable") > > > > Right. In past, it is optimized for PMD THP. Now we have mTHP. If user > > does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP > > (2MB mTHP here) is good enough, reducing pageblock size works. > > > >> > >> However, the buddy already supports having different pagetypes for large > >> allocations. > > > > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and > > MIGRATE_MOVABLE can be merged. > > Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine. > > > > >> > >> So we could leave MAX_ORDER alone and try adjusting the pageblock size > >> in these setups. pageblock size is already variable on some > >> architectures IIRC. > > Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the 16KiB page size kernel, I tried these 2 configurations: #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES) and #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES) with both of them, the kernel failed to boot. > > Making pageblock size a boot time variable? We might want to warn > > sysadmin/user that >pageblock_order THP/mTHP creation will suffer. > > Yes, some way to configure it. > > > > >> > >> We'd only have to check if all of the THP logic can deal with pageblock > >> size < THP size. > > The reason that THP was disabled in my experiment is because this assertion failed mm/huge_memory.c /* * hugepages can't be allocated by the buddy allocator */ MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER); when config ARCH_FORCE_MAX_ORDER int ..... default "8" if ARM64_16K_PAGES > > Probably yes, pageblock should be independent of THP logic, although > > compaction (used to create THPs) logic is based on pageblock. > > Right. As raised in the past, we need a higher level mechanism that > tries to group pageblocks together during comapction/conversion to limit > fragmentation on a higher level. > > I assume that many use cases would be fine with not using 32MB/512MB > THPs at all for now -- and instead using 2 MB ones. Of course, for very > large installations it might be different. > > >> > >> This issue is even more severe on arm64 with 64k (pageblock = 512MiB). > > I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get: PAGE_SIZE | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES 4KiB | 15 | 4KiB * 32KiB = 128MiB 16KiB | 13 | 16KiB * 8KiB = 128MiB 64KiB | 13 | 64KiB * 8KiB = 512MiB > > This is also good for virtio-mem, since the offline memory block size > > can also be reduced. I remember you complained about it before. > > Yes, yes, yes! :) > > -- > Cheers, > > David / dhildenb >