Sorry,I am new.
>But,
>1) in cpu_up(), it will try to online a node, and it doesn't check if
>the node has memory.
>2) in try_offline_node(), it offlines CPUs first, and then the memory.
>This behavior looks a little wired, or let's say it is ambiguous. It
>seems that a NUMA node
>consists of CPUs and memory. So if the CPUs are online, the node should
>be online.
I suggested you to try the patch offered by Liu Jiang.
https://lkml.org/lkml/2014/9/11/1087
I have tried ,It is OK.
>Unfortunately, since I don't have a machine a with memory-less node, I
>cannot reproduce
>the problem right now.
If not hurried , I can test your patches in our environment on weekends.
gongzhaogang@xxxxxxxxxx
From: Tang ChenDate: 2015-08-04 11:36To: Tejun HeoCC: mingo@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; rjw@xxxxxxxxxxxxx; hpa@xxxxxxxxx; laijs@xxxxxxxxxxxxxx; yasu.isimatu@xxxxxxxxx; isimatu.yasuaki@xxxxxxxxxxxxxx; kamezawa.hiroyu@xxxxxxxxxxxxxx; izumi.taku@xxxxxxxxxxxxxx; gongzhaogang@xxxxxxxxxx; qiaonuohan@xxxxxxxxxxxxxx; x86@xxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; tangchen@xxxxxxxxxxxxxxSubject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.Hi TJ,Sorry for the late reply.On 07/16/2015 05:48 AM, Tejun Heo wrote:> ......> so in initialization pharse makes no sense any more. The best near online> node for each cpu should be cached somewhere.> I'm not really following. Is this because the now offline node can> later come online and we'd have to break the constant mapping> invariant if we update the mapping later? If so, it'd be nice to> spell that out.Yes. Will document this in the next version.>> ......>>>> +int get_near_online_node(int node)>> +{>> + return per_cpu(x86_cpu_to_near_online_node,>> + cpumask_first(&node_to_cpuid_mask_map[node]));>> +}>> +EXPORT_SYMBOL(get_near_online_node);> Umm... this function is sitting on a fairly hot path and scanning a> cpumask each time. Why not just build a numa node -> numa node array?Indeed. Will avoid to scan a cpumask.> ......>>>>> static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,>> unsigned int order)>> {>> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));>> + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);>> +>> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)>> + if (!node_online(nid))>> + nid = get_near_online_node(nid);>> +#endif>>>> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));>> }> Ditto. Also, what's the synchronization rules for NUMA node> on/offlining. If you end up updating the mapping later, how would> that be synchronized against the above usages?I think the near online node map should be updated when node online/offlinehappens. But about this, I think the current numa code has a little problem.As you know, firmware info binds a set of CPUs and memory to a node. Butat boot time, if the node has no memory (a memory-less node) , it won'tbe online.But the CPUs on that node is available, and bound to the near online node.(Here, I mean numa_set_node(cpu, node).)Why does the kernel do this ? I think it is used to ensure that we canallocate memorysuccessfully by calling functions like alloc_pages_node() andalloc_pages_exact_node().By these two fuctions, any CPU should be bound to a node who has memoryso thatmemory allocation can be successful.That means, for a memory-less node at boot time, CPUs on the node isonline,but the node is not online.That also means, "the node is online" equals to "the node has memory".Actually, thereare a lot of code in the kernel is using this rule.But,1) in cpu_up(), it will try to online a node, and it doesn't check ifthe node has memory.2) in try_offline_node(), it offlines CPUs first, and then the memory.This behavior looks a little wired, or let's say it is ambiguous. Itseems that a NUMA nodeconsists of CPUs and memory. So if the CPUs are online, the node shouldbe online.And also,The main purpose of this patch-set is to make the cpuid <-> nodeidmapping persistent.After this patch-set, alloc_pages_node() and alloc_pages_exact_node()won't depend oncpuid <-> nodeid mapping any more. So the node should be online if theCPUs on it areonline. Otherwise, we cannot setup interfaces of CPUs under /sys.Unfortunately, since I don't have a machine a with memory-less node, Icannot reproducethe problem right now.How do you think the node online behavior should be changed ?Thanks.