MOTIVATION: Some Broadcom devices (e.g. 7445, 7278) contain multiple memory controllers with each mapped in a different address range within a Uniform Memory Architecture. Some users of these systems have expressed the desire to locate ZONE_MOVABLE memory on each memory controller to allow user space intensive processing to make better use of the additional memory bandwidth. Unfortunately, the historical monotonic layout of zones would mean that if the lowest addressed memory controller contains ZONE_MOVABLE memory then all of the memory available from memory controllers at higher addresses must also be in the ZONE_MOVABLE zone. This would force all kernel memory accesses onto the lowest addressed memory controller and significantly reduce the amount of memory available for non-movable allocations. The main objective of this patch set is therefore to allow a block of memory to be designated as part of the ZONE_MOVABLE zone where it will always only be used by the kernel page allocator to satisfy requests for movable pages. The term Designated Movable Block is introduced here to represent such a block. The favored implementation allows modification of the 'movablecore' kernel parameter to allow specification of a base address and support for multiple blocks. The existing 'movablecore' mechanisms are retained. BACKGROUND: NUMA architectures support distributing movablecore memory across each node, but it is undesirable to introduce the overhead and complexities of NUMA on systems that don't have a Non-Uniform Memory Architecture. Commit 342332e6a925 ("mm/page_alloc.c: introduce kernelcore=mirror option") also depends on zone overlap to support sytems with multiple mirrored ranges. Commit c6f03e2903c9 ("mm, memory_hotplug: remove zone restrictions") embraced overlapped zones for memory hotplug. This commit set follows their lead to allow the ZONE_MOVABLE zone to overlap other zones while spanning the pages from the lowest Designated Movable Block to the end of the node. Designated Movable Blocks are made absent from overlapping zones and present within the ZONE_MOVABLE zone. I initially investigated an implementation using a Designated Movable migrate type in line with comments[1] made by Mel Gorman regarding a "sticky" MIGRATE_MOVABLE type to avoid using ZONE_MOVABLE. However, this approach was riskier since it was much more instrusive on the allocation paths. Ultimately, the progress made by the memory hotplug folks to expand the ZONE_MOVABLE functionality convinced me to follow this approach. OTHER OPPORTUNITIES: CMA introduced a paradigm where multiple allocators could operate on the same region of memory, and that paradigm can be extended to Designated Movable Blocks as well. I was interested in using kernel resource management as a mechanism for exposing Designated Movable Block resources (e.g. /proc/iomem) that would be used by the kernel page allocator like any other ZONE_MOVABLE memory, but could be claimed by an alternative allocator (e.g. CMA). Unfortunately, this becomes complicated because the kernel resource implementation varies materially across different architectures and I do not require this capability so I have deferred that. The Devicetree Specification includes support for specifying reserved memory regions with a 'reusable' property to allow sharing of the reserved memory between device drivers and the OS. This is in line with the paradigm introduced by CMA, but is currently only used by 'shared-dma-pool' compatible reserved memory regions. Linux could choose to use Designated Movable Blocks as the default mechanism for other 'reusable' reserved memory. Device drivers that own 'reusable' reserved memory could use the dmb_intersects() function introduced here to determine whether memory requires reclamation from the OS before use and could use the alloc/free_contig_range() functions to perform the reclamation and release of memory needed by the device. The CMA allocator API could be another candidate for device driver reclamation, but it is not currently exposed for use by device drivers in modules. There have been attempts to modify the behavior of the kernel page allocators use of CMA regions (e.g. [1] & [2]). This implementation of Designated Movable Blocks creates an opportunity to allow the CMA allocator to operate on ZONE_MOVABLE memory that the kernel page allocator can use more agressively, without affecting users of the existing CMA implementation. This would have benefit when memory reuse is more valuable than the cost of increased latency of CMA allocations (e.g. hugetlb_cma). These other opportunities are dependent on the Designated Movable Block concept introduced here, so I will hold off submitting any such follow-on proposals until there is movement on this commit set. NOTES: The MEMBLOCK_MOVABLE and MEMBLOCK_HOTPLUG flags have a lot in common and could potentially be consolidated, but I chose to avoid that here to reduce controversy. The CMA and DMB alignment constraints are currently the same so the logic could be simplified, but this implementation keeps them distinct to facilitate independent evolution of the implementations if necessary. Changes in v2: - first three commits upstreamed separately [3], [4], and [5]. - commits 04-06 submitted separately [6]. - Corrected errors "Reported-by: kernel test robot <lkp@xxxxxxxxx>" - Deferred commits after 15 to simplify review of the base functionality. - minor reorganization of commit 13. v1: https://lore.kernel.org/linux-mm/20220913195508.3511038-1-opendmb@xxxxxxxxx/ [1] https://lore.kernel.org/lkml/20160428103927.GM2858@xxxxxxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/lkml/1401260672-28339-1-git-send-email-iamjoonsoo.kim@xxxxxxx [3] https://lore.kernel.org/linux-mm/20220914023913.1855924-1-zi.yan@xxxxxxxx [4] https://lore.kernel.org/linux-mm/20220823030209.57434-2-linmiaohe@xxxxxxxxxx [5] https://lore.kernel.org/linux-mm/20220914190917.3517663-1-opendmb@xxxxxxxxx [6] https://lore.kernel.org/linux-mm/20220921223639.1152392-1-opendmb@xxxxxxxxx/ Doug Berger (9): lib/show_mem.c: display MovableOnly mm/vmstat: show start_pfn when zone spans pages mm/page_alloc: calculate node_spanned_pages from pfns mm/page_alloc.c: allow oversized movablecore mm/page_alloc: introduce init_reserved_pageblock() memblock: introduce MEMBLOCK_MOVABLE flag mm/dmb: Introduce Designated Movable Blocks mm/page_alloc: make alloc_contig_pages DMB aware mm/page_alloc: allow base for movablecore .../admin-guide/kernel-parameters.txt | 14 +- include/linux/dmb.h | 29 ++++ include/linux/gfp.h | 5 +- include/linux/memblock.h | 8 + lib/show_mem.c | 2 +- mm/Kconfig | 12 ++ mm/Makefile | 1 + mm/cma.c | 15 +- mm/dmb.c | 91 ++++++++++ mm/memblock.c | 30 +++- mm/page_alloc.c | 155 ++++++++++++++---- mm/vmstat.c | 5 + 12 files changed, 321 insertions(+), 46 deletions(-) create mode 100644 include/linux/dmb.h create mode 100644 mm/dmb.c -- 2.25.1