Hi Tang, The patch works well on my x86_64 box. I confirmed that hotpluggable node is allocated as Movable Zone. So feel free to add: Tested by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx> Nitpick below. 2013/04/05 18:39, Tang Chen wrote: > Before this patch-set, we introduced movablemem_map boot option which allowed > users to specify physical address ranges to set memory as movable. This is not > user friendly enough for normal users. > > So now, we introduce just movablemem_map=acpi to allow users to enable/disable > the kernel to use Hot Pluggable bit in SRAT to determine which memory ranges are > hotpluggable, and set them as ZONE_MOVABLE. > > This patch-set is based on Yinghai's patch-set: > v1: https://lkml.org/lkml/2013/3/7/642 > v2: https://lkml.org/lkml/2013/3/10/47 > > So it supports to allocate pagetable pages in local nodes. > > We also split the large patch-set into smaller ones, and it seems easier to review. > > > ======================================================================== > [What we are doing] > This patchset introduces a boot option for users to specify ZONE_MOVABLE > memory map for each node in the system. Users can use it in two ways: > > 1. movablecore_map=acpi > In this way, the kernel will use Hot Pluggable bit in SRAT to determine > ZONE_MOVABLE for each node. All the ranges user has specified will be > ignored. > > > [Why we do this] > If we hot remove a memroy device, it cannot have kernel memory, > because Linux cannot migrate kernel memory currently. Therefore, > we have to guarantee that the hot removed memory has only movable > memoroy. > (Here is an exception: When we implement the node hotplug functionality, > for those kernel memory whose life cycle is the same as the node, such as > pagetables, vmemmap and so on, although the kernel cannot migrate them, > we can still put them on local node because we can free them before we > hot-remove the node. This is not completely implemented yet.) > > Linux has two boot options, kernelcore= and movablecore=, for > creating movable memory. These boot options can specify the amount > of memory use as kernel or movable memory. Using them, we can > create ZONE_MOVABLE which has only movable memory. > (NOTE: doing this will cause NUMA performance because the kernel won't > be able to distribute kernel memory evenly to each node.) > > But it does not fulfill a requirement of memory hot remove, because > even if we specify the boot options, movable memory is distributed > in each node evenly. So when we want to hot remove memory which > memory range is 0x80000000-0c0000000, we have no way to specify > the memory as movable memory. > > Furthermore, even if we can use SRAT, users still need an interface > to enable/disable this functionality if they don't want to lose their > NUMA performance. So I think, a user interface is always needed. > > So we proposed this new feature which enable/disable the kernel to set > hotpluggable memory as ZONE_MOVABLE. > > > [Ways to do this] > There may be 2 ways to specify movable memory. > 1. use firmware information > 2. use boot option > > 1. use firmware information > According to ACPI spec 5.0, SRAT table has memory affinity structure > and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory > Affinity Structure". If we use the information, we might be able to > specify movable memory by firmware. For example, if Hot Pluggable > Filed is enabled, Linux sets the memory as movable memory. > > 2. use boot option > This is our proposal. New boot option can specify memory range to use > as movable memory. > > > [How we do this] > We now propose a boot option, but support the first way above. A boot option > is always needed because set memory as movable will cause NUMA performance > down. So at least, we need an interface to enable/disable it so that users > who don't want to use memory hotplug functionality will also be happy. > > > [How to use] > Specify movablemem_map=acpi in kernel commandline: > * > * SRAT: |_____| |_____| |_________| |_________| ...... > * node id: 0 1 1 2 > * hotpluggable: n y y n > * ZONE_MOVABLE: |_____| |_________| > * > NOTE: 1) Before parsing SRAT, memblock has already reserve some memory ranges > for other purposes, such as for kernel image. We cannot prevent > kernel from using these memory, so we need to exclude these memory > even if it is hotpluggable. > Furthermore, to ensure the kernel has enough memory to boot, we make > all the memory on the node which the kernel resides in should be > un-hotpluggable. > 2) In this case, all the user specified memory ranges will be ingored. > > We also need to consider the following points: > 1) Using this boot option could cause NUMA performance down because the kernel > memory will not be distributed on each node evenly. So for users who don't > want to lose their NUMA performance, just don't use it. > 2) If kernelcore or movablecore is also specified, movablecore_map will have > higher priority to be satisfied. > 3) This option has no conflict with memmap option. > > Tane Chen (10): > acpi: Print hotplug info in SRAT. > numa, acpi, memory-hotplug: Add movablemem_map=acpi boot option. > x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct > numa_meminfo. > x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup > numa_meminfo. > X86, numa, acpi, memory-hotplug: Add hotpluggable ranges to > movablemem_map. It has a whitespace error. > x86, numa, acpi, memory-hotplug: Make any node which the kernel > resides in un-hotpluggable. > x86, numa, acpi, memory-hotplug: Introduce zone_movable_limit[] to > store start pfn of ZONE_MOVABLE. It has a whitespace error. > x86, numa, acpi, memory-hotplug: Sanitize zone_movable_limit[]. > x86, numa, acpi, memory-hotplug: make movablemem_map have higher > priority > x86, numa, acpi, memory-hotplug: Memblock limit with movablemem_map Thanks, Yasuaki Ishimatsu > > Yasuaki Ishimatsu (1): > x86: get pg_data_t's memory from other node > > Documentation/kernel-parameters.txt | 11 ++ > arch/x86/include/asm/numa.h | 3 +- > arch/x86/kernel/apic/numaq_32.c | 2 +- > arch/x86/mm/amdtopology.c | 3 +- > arch/x86/mm/numa.c | 92 ++++++++++++++-- > arch/x86/mm/numa_internal.h | 1 + > arch/x86/mm/srat.c | 28 ++++- > include/linux/memblock.h | 2 + > include/linux/mm.h | 19 +++ > mm/memblock.c | 50 ++++++++ > mm/page_alloc.c | 210 ++++++++++++++++++++++++++++++++++- > 11 files changed, 399 insertions(+), 22 deletions(-) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html