On Wed, Apr 19, 2023 at 03:01:05AM +0300, Kirill A. Shutemov wrote: > On Tue, Apr 18, 2023 at 03:12:50PM -0400, Johannes Weiner wrote: > > pageblock_order can be of various sizes, depending on configuration, > > but the default is MAX_ORDER-1. > > Note that MAX_ORDER got redefined in -mm tree recently. > > > Given 4k pages, that comes out to > > 4M. This is a large chunk for the allocator/reclaim/compaction to try > > to keep grouped per migratetype. It's also unnecessary as the majority > > of higher order allocations - THP and slab - are smaller than that. > > This seems way to x86-specific. Hey, that's the machines I have access to ;) > Other arches have larger THP sizes. I believe 16M is common. > > Maybe define it as min(MAX_ORDER, PMD_ORDER)? Hm, let me play around with larger pageblocks. The thing that gives me pause is that this seems quite aggressive as a default block size for the allocator and reclaim/compaction - if you consider the implications for internal fragmentation and the amount of ongoing defragmentation work it would require. IOW, it's not just a function of physical page size supported by the CPU. It's also a function of overall memory capacity. Independent of architecture, 2MB seems like a more reasonable step up than 16M. 16M is great for TLB coverage, and in our DCs we're getting a lot of use out of 1G hugetlb pages as well. The question is if those archs are willing to pay the cost of serving such page sizes quickly and reliably during runtime; or if that's something better left to setups with explicit preallocations and stuff like hugetlb_cma reservations.