Let's add Andi On Fri 10-03-17 16:53:33, Michal Hocko wrote: > On Fri 10-03-17 14:58:07, Michal Hocko wrote: > [...] > > This would explain why onlining from the last block actually works but > > to me this sounds like a completely crappy behavior. All we need to > > guarantee AFAICS is that Normal and Movable zones do not overlap. I > > believe there is even no real requirement about ordering of the physical > > memory in Normal vs. Movable zones as long as they do not overlap. But > > let's keep it simple for the start and always enforce the current status > > quo that Normal zone is physically preceeding Movable zone. > > Can somebody explain why we cannot have a simple rule for Normal vs. > > Movable which would be: > > - block [pfn, pfn+block_size] can be Normal if > > !zone_populated(MOVABLE) || pfn+block_size < ZONE_MOVABLE->zone_start_pfn > > - block [pfn, pfn+block_size] can be Movable if > > !zone_populated(NORMAL) || ZONE_NORMAL->zone_end_pfn < pfn > > OK, so while I was playing with this setup some more I probably got why > this is done this way. All new memblocks are added to the zone Normal > where they are accounted as spanned but not present. When we do > online_movable we just cut from the end of the Normal zone and move it > to Movable zone. This sounds really awkward. What was the reason to go > this way? Why cannot we simply add those pages to the zone at the online > time? Answering to myself. So the reason seems to be 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd without sparsemem") which is no longer true because config MEMORY_HOTPLUG bool "Allow for memory hot-add" depends on SPARSEMEM || X86_64_ACPI_NUMA depends on ARCH_ENABLE_MEMORY_HOTPLUG depends on COMPILE_TEST || !KASAN so it is either SPARSEMEM or X86_64_ACPI_NUMA that would have to be enabled. config X86_64_ACPI_NUMA def_bool y prompt "ACPI NUMA detection" depends on X86_64 && NUMA && ACPI && PCI select ACPI_NUMA But I do not see any way how to enable anything but SPARSEMEM for x86_64 choice prompt "Memory model" depends on SELECT_MEMORY_MODEL default DISCONTIGMEM_MANUAL if ARCH_DISCONTIGMEM_DEFAULT default SPARSEMEM_MANUAL if ARCH_SPARSEMEM_DEFAULT default FLATMEM_MANUAL ARCH_SPARSEMEM_DEFAULT is 32b only config ARCH_DISCONTIGMEM_DEFAULT def_bool y depends on NUMA && X86_32 and ARCH_SPARSEMEM_DEFAULT is enabeld on 64b. So I guess whatever was the reason to add this code back in 2006 is not true anymore. So I am really wondering. Do we absolutely need to assign pages which are not onlined yet to the ZONE_NORMAL unconditionally? Why cannot we put them out of any zone and wait for memory online operation to put them where requested? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html