From: Björn Töpel <bjorn@xxxxxxxxxxxx> Memory Hot(Un)Plug support for the RISC-V port ============================================== Introduction ------------ To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory hot(un)plug allows for increasing and decreasing the size of physical memory available to a machine at runtime." This series attempts to add memory hot(un)plug support for the RISC-V Linux port. I'm sending the series as a v1, but it's borderline RFC. It definitely needs more testing time, but it would be nice with some early input. Implementation -------------- >From an arch perspective, a couple of callbacks needs to be implemented to support hot plugging: arch_add_memory() This callback is responsible for updating the linear/direct map, and call into the memory hot plugging generic code via __add_pages(). arch_remove_memory() In this callback the linear/direct map is tore down. vmemmap_free() The function tears down the vmemmap mappings (if CONFIG_SPARSEMEM_VMEMMAP is in-use), and also deallocates the backing vmemmap pages. Note that for persistent memory, an alternative allocator for the backing pages can be used -- the vmem_altmap. This means that when the backing pages are cleared, extra care is needed so that the correct deallocation method is used. Note that RISC-V populates the vmemmap using vmemmap_populate_basepages(), so currently no hugepages are used for the backing store. The page table unmap/teardown functions are heavily based (copied!) from the x86 tree. The same remove_pgd_mapping() is used in both vmemmap_free() and arch_remove_memory(), but in the latter function the backing pages are not removed. On RISC-V, the PGD level kernel mappings needs to synchronized with all page-tables (e.g. via sync_kernel_mappings()). Synchronization involves special care, like locking. Instead, this patch series takes a different approach (introduced by Jörg Rödel in the x86-tree); Pre-allocate the PGD-leaves (P4D, PUD, or PMD depending on the paging setup) at mem_init(), for vmemmap and the direct map. Pre-allocating the PGD-leaves waste some memory, but is only enabled for CONFIG_MEMORY_HOTPLUG. The number pages, potentially unused, are ~128 * 4K. Patch 1: Preparation for hotplugging support, by pre-allocating the PGD leaves. Patch 2: Changes the __init attribute to __meminit, to avoid that the functions are removed after init. __meminit keeps the functions after init, if memory hotplugging is enabled for the build. Patch 3: Refactor the direct map setup, so it can be used for hot add. Patch 4: The actual add/remove code. Mostly a page-table-walk exercise. Patch 5: Turn on the arch support in Kconfig Patch 6: Now that memory hotplugging is enabled, make virtio-mem usable for RISC-V Patch 7: Pre-allocate vmalloc PGD-leaves as well, which removes the need for vmalloc faulting. RFC --- * TLB flushes. The current series uses Big Hammer flush-it-all. * Pre-allocation vs explicit syncs Testing ------- ACPI support is still in the making for RISC-V, so tests that involve CXL and similar fanciness is currently not possible. Virtio-mem, however, works without proper ACPI support. In order to try this out in Qemu, some additional patches for Qemu are needed: * Enable virtio-mem for RISC-V * Add proper hotplug support for virtio-mem The patch for Qemu can be found is commit 5d90a7ef1bc0 ("hw/riscv/virt: Support for virtio-mem-pci"), and can be found here https://github.com/bjoto/qemu/tree/riscv-virtio-mem I will try to upstream that work in parallel with this. Thanks to David Hildenbrand for valuable input for the Qemu side of things. The series is based on the RISC-V fixes tree https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=fixes Thanks, Björn Björn Töpel (7): riscv: mm: Pre-allocate PGD leaves to avoid synchronization riscv: mm: Change attribute from __init to __meminit for page functions riscv: mm: Refactor create_linear_mapping_range() for hot add riscv: mm: Add memory hot add/remove support riscv: Enable memory hot add/remove arch kbuild support virtio-mem: Enable virtio-mem for RISC-V riscv: mm: Pre-allocate vmalloc PGD leaves arch/riscv/Kconfig | 2 + arch/riscv/include/asm/kasan.h | 4 +- arch/riscv/include/asm/mmu.h | 2 +- arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/mm/fault.c | 7 +- arch/riscv/mm/init.c | 387 ++++++++++++++++++++++++++++--- drivers/virtio/Kconfig | 2 +- 7 files changed, 364 insertions(+), 42 deletions(-) base-commit: 3b90b09af5be42491a8a74a549318cfa265b3029 -- 2.39.2