On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote: > From: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx> > > If system can create movable node which all memory of the node is allocated > as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's > pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid() > to retry when the first allocation fails. Otherwise, the system could failed > to boot. > > The node_data could be on hotpluggable node. And so could pagetable and > vmemmap. But for now, doing so will break memory hot-remove path. > > A node could have several memory devices. And the device who holds node > data should be hot-removed in the last place. But in NUAM level, we don't NUAM -> NUMA > know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs > to which memory device. We only have node. So we can only do node hotplug. > > But in virtualization, developers are now developing memory hotplug in qemu, > which support a single memory device hotplug. So a whole node hotplug will > not satisfy virtualization users. > > So at last, we concluded that we'd better do memory hotplug and local node > things (local node node data, pagetable, vmemmap, ...) in two steps. > Please refer to https://lkml.org/lkml/2013/6/19/73 > > For now, we put node_data of movable node to another node, and then improve > it in the future. > > In the later patches, a boot option will be introduced to enable/disable this > functionality. If users disable it, the node_data will still be put on the > local node. > > Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx> > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> > Signed-off-by: Tang Chen <tangchen@xxxxxxxxxxxxxx> > Signed-off-by: Jiang Liu <jiang.liu@xxxxxxxxxx> > Reviewed-by: Wanpeng Li <liwanp@xxxxxxxxxxxxxxxxxx> > Reviewed-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx> Acked-by: Toshi Kani <toshi.kani@xxxxxx> Thanks, -Toshi > --- > arch/x86/mm/numa.c | 5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index a71c4e2..5013583 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end) > * Allocate node data. Try node-local memory and then any node. > * Never allocate in DMA zone. > */ > - nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); > + nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); > if (!nd_pa) { > - pr_err("Cannot find %zu bytes in node %d\n", > - nd_size, nid); > + pr_err("Cannot find %zu bytes in any node\n", nd_size); > return; > } > nd = __va(nd_pa); -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html