.:: Intro This code illustrates the idea I'm proposing at LSF/MM/BPF [0]. Sorry it's so close to the conference, I was initially quite ambitious in what I wanted to show here and tried to implement a more complete patch series. Now I've run out of time and I've had to reduce the scope and just hack some minimal stuff together. Now, this series is _only_ supposed to be about page_alloc.c, everything else is just there as scaffolding so that allocator code can be discussed. I've marked the most incomplete patches with [HACKS] in the title to illustrate what aspects are less worthy of attention. See [0] and also [1] for broader context on the ASI/page_alloc topic. See [2] for context about ASI itself. For this RFC the most important fact is: ASI requires creating another kernel address space (the "restricted address space") that is a subset of that normal one (i.e. the "unrestricted address space"). That is, an address space just like the normal one, but with holes in it. Pages that are unmapped from the restricted address space are called "sensitive". .:: The Idea What is sensitive (i.e. where the holes are) is decided at allocation time. This illustrates an initial implementation of that capability for the direct map. The basic idea of this implementation is to operate at pageblock-granularity, and use migratetypes to track sensitivity. The key advantages of this approach are: - Migratetypes exist to avoid fragmentation. Using them to index pages by sensitivity takes advantage of this, so that the physmap doesn't get fragmented with respect to sensitivity. This means we can use large TLB entries for the restricted physmap. - Since pageblocks are never smaller than a PMD mapping, if the restricted physmap is always made of PMDs, we never have to break down mappings while changing sensitivity. This means we don't have difficulties with needing to allocate pagetables in the middle of the allocator. - Migratetypes already offer indexing capability - that is, there are separate freelists for each migratetype. This means when the user allocates a page with a given sensitivity, all the infrastructure is already in place to look up a page that is already mapped/unmapped as needed (if it exists). This minimizes unnecessary TLB flushes. This differs from Mike Rapoport's work on __GFP_UNMAPPED [3] in that, instead of having a totally separate free area for the pages that are unmapped, it aims to pervade the allocator. If it turns out that for all nonsensitive (or all sensitive, which seems highly unlikely) pages, a access to the full feature set of the page allocator is not needed for a performant system, we could certainly do something like Mike's patchset. But we don't have any reason to expect a correlation between sensitivity and performance needs. .:: Patchset overview - Patch 1 adds a minimal subset of the base ASI framework that was introduced by the RFCv2 [2]. - Patches 2-5 add the necessary framework for creating and manipulating the ASI physmap. This is the area where I have had to reduce the scope of this series, I had hoped to present a proper integration here. But instead I've had to just hack something together that kinda works. You can probably skip over this section. - Patches 6-8 are preparatory hacks and changes to the generic mm code. - Patches 9-11 are the important bit. The new migratetypes are created. Then logic is added to create nonsensitive pageblocks when needed. Then logic is added to change them back to sensitive pageblocks when needed. .:: TODOs - This doesn't let you allocate from MIGRATE_HIGHATOMIC pageblocks unless you have __GFP_SENSITIVE. We probably need to make the pageblock type and per-freelist logic more advanced to be able to account for this. - When pages transition from sensitive to nonsensitive, they need to be zeroed to prevent any leftover data being leaked. This series doesn't address that requirement at all. - Although I think the abstract design is OK, the actual implementation of calling asi_map()/asi_unmap() from page_alloc.c is pretty confusing: asi_map() is implicit when calling set_pageblock_migratetype() but asi_unmap() is up to the caller. This requires some refactoring. - Changes to the unrestricted physmap (page protection changes, memory hotplug) are not properly mirrored into the restricted physmap. - There's no integration with CMA. The branch at [4] has some minimal integration into alloc_contig_range(). .:: References [0] https://lore.kernel.org/linux-mm/CA+i-1C169s8pyqZDx+iSnFmftmGfssdQA29+pYm-gqySAYWgpg@xxxxxxxxxxxxxx/ [1] Some slides I presented in an earlier discussion of this topic: https://docs.google.com/presentation/d/1Ozuan7E4z2YWm4V6uE_fe7YoF2BdS3m5jXjDKO7DVy0/edit#slide=id.g32d28ea451a_0_43 [2] https://lore.kernel.org/linux-mm/20250110-asi-rfc-v2-v2-0-8419288bc805@xxxxxxxxxx/ [3] https://lore.kernel.org/all/20230308094106.227365-1-rppt@xxxxxxxxxx/ [5] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@xxxxxxxxxx/ This series is available as a branch with some additional testing here: [4] https://github.com/bjackman/linux/tree/asi/page-alloc-lsfmmbpf25 This applies to mm-unstable. Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> --- Brendan Jackman (11): x86/mm: Bare minimum ASI API for page_alloc integration x86/mm: Factor out phys_pgd_init() x86/mm: Add lookup_pgtable_in_pgd() x86/mm/asi: Sync physmap into ASI_GLOBAL_NONSENSITIVE [RFC HACKS] Add asi_map() and asi_unmap() mm/page_alloc: Add __GFP_SENSITIVE and always set it [RFC HACKS] mm/slub: Set __GFP_SENSITIVE for reclaimable slabs [RFC HACKS] mm/page_alloc: Simplify gfp_migratetype() mm/page_alloc: Split MIGRATE_UNMOVABLE by sensitivity mm/page_alloc: Add support for nonsensitive allocations mm/page_alloc: Add support for ASI-unmapping pages arch/Kconfig | 14 ++++ arch/x86/Kconfig | 1 + arch/x86/include/asm/asi.h | 36 ++++++++ arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/mm/Makefile | 1 + arch/x86/mm/asi.c | 85 +++++++++++++++++++ arch/x86/mm/init.c | 3 +- arch/x86/mm/init_64.c | 53 ++++++++++-- arch/x86/mm/pat/set_memory.c | 34 ++++++++ include/linux/asi.h | 20 +++++ include/linux/gfp.h | 30 ++++--- include/linux/gfp_types.h | 15 +++- include/linux/mmzone.h | 19 ++++- include/linux/vmalloc.h | 4 + mm/internal.h | 5 ++ mm/memory_hotplug.c | 2 +- mm/page_alloc.c | 158 +++++++++++++++++++++++++++++++---- mm/show_mem.c | 13 +-- mm/slub.c | 6 +- mm/vmalloc.c | 32 ++++--- 20 files changed, 475 insertions(+), 58 deletions(-) --- base-commit: 5ee93e1a769230377c3d44edd4917e8df77be566 change-id: 20250310-asi-page-alloc-80ea1f8307d0 Best regards, -- Brendan Jackman <jackmanb@xxxxxxxxxx>