Hi David,
Hi :)
I am currently working on adding memory hotplug-related functionality to the weighted interleave feature. While discussing this with Gregory, a question came up. If you happen to know the answer to the following, I would greatly appreciate your input. I have added the following logic to call add_weight_node when a node transitions to the N_MEMORY state to create a sysfs entry. Conversely, when all memory blocks of a node go offline (!N_MEMORY), I call sysfs_wi_node_release to remove the corresponding sysfs entry.
As a spoiler: I don't like how we squeezed the status_change_nid / status_change_nid_normal stuff into the memory notifier that works on a single memory block -> single zone. But decoupling it is not as easy, because we have this status_change_nid vs. status_change_nid_normal thing.
For the basic "node going offline / node going online", a separate notifier (or separate callbacks) would make at least your use case much clearer.
The whole "status_change_nid_normal" is only used by slab. I wonder if we could get rid of it, and simply let slab check the relevant zone->present pages when notified about onlining/offlining of a singe memory block.
Then, we would only have status_change_nid and could just convert that to a NODE_MEM_ONLINE / NODE_MEM_OFFLINE notifier or sth like that.
Hmmm, if I wouldn't be on PTO, I would prototype that real quick :)
+static int wi_node_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + int err; + struct memory_notify *arg = data; + int nid = arg->status_change_nid; + + if (nid < 0) + goto notifier_end; + + switch(action) { + case MEM_ONLINE: + err = add_weight_node(nid, wi_kobj); + if (err) { + pr_err("failed to add sysfs [node%d]\n", nid); + kobject_put(wi_kobj); + return NOTIFY_BAD; + } + break; + case MEM_OFFLINE: + sysfs_wi_node_release(node_attrs[nid], wi_kobj); + break; + } + +notifier_end: + return NOTIFY_OK; +} One question I have is whether the MEM_OFFLINE action in the code below will be triggered when a node that consists of multiple memory blocks has only one of its memory blocks transitioning to the offline state.
node_states_check_changes_offline() should be making sure that that is the case.
Only if offlining the current block will make the node (all zones) have no present pages any more, then we'll set status_change_nid to >= 0.
+ int nid = arg->status_change_nid; + + if (nid < 0) + goto notifier_end; Based on my analysis, wi_node_notifier should function as expected because arg->status_change_nid only holds a non-negative value under the following conditions: 1) !N_MEMORY -> N_MEMORY When the first memory block of a node transitions to the online state, it holds a non-negative value. In all other cases, it remains -1 (NUMA_NO_NODE). 2) N_MEMORY -> !N_MEMORY When all memory blocks of a node transition to the offline state, it holds a non-negative value. In all other cases, it remains -1 (NUMA_NO_NODE). I would truly appreciate it if you could confirm whether this analysis is correct. Below is a more detailed explanation of my findings.
Yes, that's at least how it is supposed to work (-bugs, but I am not aware of any) :)
-- Cheers, David / dhildenb