On Wed 05-04-17 18:34:39, Michal Hocko wrote: > On Wed 05-04-17 10:48:52, Reza Arbab wrote: > > On Wed, Apr 05, 2017 at 08:42:39AM +0200, Michal Hocko wrote: > > >On Tue 04-04-17 16:43:39, Reza Arbab wrote: > > >>Okay, getting further. With this I can again repeatedly add and remove, > > >>but now I'm seeing a weird variation of that earlier issue: > > >> > > >>1. add_memory(), online_movable > > >> /sys/devices/system/node/nodeX/memoryY symlinks are created. > > >> > > >>2. offline, remove_memory() > > >> The node is offlined, since all memory has been removed, so all of > > >> /sys/devices/system/node/nodeX is gone. This is normal. > > >> > > >>3. add_memory(), online_movable > > >> The node is onlined, so /sys/devices/system/node/nodeX is recreated, > > >> and the memory is added, but just like earlier in this email thread, > > >> the memoryY links are not there. > > > > > >Could you add some printks to see why the sysfs creation failed please? > > > > Ah, simple enough. It's this, right at the top of > > register_mem_sect_under_node(): > > > > if (!node_online(nid)) > > return 0; > > > > That being the case, I really don't understand why your patches make any > > difference. Is node_set_online() being called later than before somehow? > > This is really interesting. Because add_memory_resource does the > following > /* call arch's memory hotadd */ > ret = arch_add_memory(nid, start, size); > > if (ret < 0) > goto error; > > /* we online node here. we can't roll back from here. */ > node_set_online(nid); > > so we are setting the node online _after_ arch_add_memory but the code > which adds those sysfs file is called from > > arch_add_memory > __add_pages > __add_section > register_new_memory > register_mem_sect_under_node > node_online check > > I haven't touched this part. What is the point of this check anyway? We > have already associated all the pages with a node (and with a zone prior > to my patches) so we _know_ how to create those links. The check goes > back to the initial submissions. Gary is not available anymore so we > cannot ask. But I completely fail to see how my changes could have made > any difference. I wasn't able to undestand that from the code so I've just tried to remove the check and it blown up BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: sysfs_create_link_nowarn+0x13/0x32 if (!kobj) parent = sysfs_root_kn; else parent = kobj->sd; ^^^^^^^^^^^^^^^^^^ when creating the link register_mem_sect_under_node: ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj, &mem_blk->dev.kobj, kobject_name(&mem_blk->dev.kobj)); which means that node_devices[nid]->dev.kobj is NULL. This happens later in register_one_node->register_node. This really _screems_ for a clean up! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>