On Tue 17-03-20 14:44:45, Vlastimil Babka wrote: > On 3/16/20 10:06 AM, Michal Hocko wrote: > > On Thu 12-03-20 17:41:58, Vlastimil Babka wrote: > > [...] > >> with nid present in: > >> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some online > > > > I would rather have a dummy pgdat for those. Have a look at > > $ git grep "NODE_DATA.*->" | wc -l > > 63 > > > > Who knows how many else we have there. I haven't looked more closely. > > Besides that what is a real reason to not have pgdat ther and force all > > users of a $random node from those that the platform considers possible > > for special casing? Is that a memory overhead? Is that really a thing? > > I guess we can ignore memory overhead. I guess there only might be some concern > that for nodes that are initially offline, we will allocate the pgdat on a > different node, and after they are online, it will stay on a different node with > more access latency from local cpus. If we only allocate for online nodes, it > can always be local? But I guess it doesn't matter that much. This is not the case even now because of chicke&egg. You need a memory to allocate from and that memory has to be managed somewhere per node (pgdat). Keep in mind we do not have the bootmem allocator for the hotplug. Have a look at hotadd_new_pgdat and when it is called. There are some attempts to allocate memmap from the hotpluged memory but I am not sure we can do the whole thing without pgdat in place. If we can then can come up with some replace the pgdat magic. But still I am not even sure this is something we really have to optimize for. -- Michal Hocko SUSE Labs