On Wed, 12 Mar 2025 12:03:01 -0400 Gregory Price <gourry@xxxxxxxxxx> wrote: > On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote: > > The weighted interleave policy distributes page allocations across multiple > > NUMA nodes based on their performance weight, thereby optimizing memory > > bandwidth utilization. The weight values for each node are configured > > through sysfs. > > > > Previously, the sysfs entries for configuring weighted interleave were only > > created during initialization. This approach had several limitations: > > - Sysfs entries were generated for all possible nodes at boot time, > > including nodes without memory, leading to unnecessary sysfs creation. > > It's not that it's unnecessary, it's that it allowed for configuration > of nodes which may not have memory now but may have memory in the > future. This was not well documented. I will update the commit message to reflect your feedback. > > > - Some memory devices transition to an online state after initialization, > > but the existing implementation failed to create sysfs entries for > > these dynamically added nodes. As a result, memory hotplugged nodes > > were not properly recognized by the weighed interleave mechanism. > > > > The current system creates 1 node per N_POSSIBLE nodes, and since nodes > can't transition between possible and not-possible your claims here are > contradictory. > > I think you mean that simply switching from N_POSSIBLE to N_MEMORY is > insufficient since nodes may transition in and out of the N_MEMORY > state. Therefore this patch utilizes a hotplug callback to add and > remove sysfs entries based on whether a node is in the N_MEMORY set. I will update the commit message to reflect your feedback. > > > To resolve these issues, this patch introduces two key improvements: > > 1) At initialization, only nodes that are online and have memory are > > recognized, preventing the creation of unnecessary sysfs entries. > > 2) Nodes that become available after initialization are dynamically > > detected and integrated through the memory hotplug mechanism. > > > > With this enhancement, the weighted interleave policy now properly supports > > memory hotplug, ensuring that newly added nodes are recognized and sysfs > > entries are created accordingly. > > > > It doesn't "support memory hotplug" so much as it "Minimizes weighted > interleave to exclude memoryless nodes". I will update the commit message to reflect your feedback. > > > Signed-off-by: Rakie Kim <rakie.kim@xxxxxx> > > --- > > mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 42 insertions(+), 5 deletions(-) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 1691748badb2..94efff89e0be 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -113,6 +113,7 @@ > > #include <asm/tlbflush.h> > > #include <asm/tlb.h> > > #include <linux/uaccess.h> > > +#include <linux/memory.h> > > > > #include "internal.h" > > > > @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj) > > return 0; > > } > > > > +struct kobject *wi_kobj; > > + > > +static int wi_node_notifier(struct notifier_block *nb, > > + unsigned long action, void *data) > > +{ > > + int err; > > + struct memory_notify *arg = data; > > + int nid = arg->status_change_nid; > > + > > + if (nid < 0) > > + goto notifier_end; > > + > > + switch(action) { > > + case MEM_ONLINE: > > + err = add_weight_node(nid, wi_kobj); > > + if (err) { > > + pr_err("failed to add sysfs [node%d]\n", nid); > > + kobject_put(wi_kobj); > > + return NOTIFY_BAD; > > + } > > + break; > > + case MEM_OFFLINE: > > + sysfs_wi_node_release(node_attrs[nid], wi_kobj); > > + break; > > + } > > I'm fairly certain this logic is wrong. If I add two memory blocks and > then remove one, would this logic not remove the sysfs entries despite > there being a block remaining? Regarding the assumption about node configuration: Are you assuming that a node has two memory blocks and that MEM_OFFLINE is triggered when one of them is offlined? If so, then you are correct that this logic would need modification. I performed a simple test by offlining a single memory block: # echo 0 > /sys/devices/system/node/node2/memory100/online In this case, MEM_OFFLINE was not triggered. However, I need to conduct further analysis to confirm this behavior under different conditions. I will review this in more detail and share my findings, including the test methodology and results. > > > + > > +notifier_end: > > + return NOTIFY_OK; > > +} > > + > > static int add_weighted_interleave_group(struct kobject *root_kobj) > > { > > - struct kobject *wi_kobj; > > int nid, err; > > > > wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL); > > @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj) > > return err; > > } > > > > - for_each_node_state(nid, N_POSSIBLE) { > > + for_each_online_node(nid) { > > + if (!node_state(nid, N_MEMORY)) > > Rather than online node, why not just add for each N_MEMORY node - > regardless of if its memory is online or not? If the memory is offline, > then it will be excluded from the weighted interleave mechanism by > nature of the node being invalid for allocations anyway. Regarding the decision to check both N_MEMORY and N_ONLINE: This was done to ensure consistency with the conditions under which `wi_node_notifier` is triggered. Specifically, `MEM_ONLINE` is called only when a node is in both the N_MEMORY and N_ONLINE states. I will review this logic further. If my understanding is correct, keeping the current implementation is the appropriate approach. However, I will conduct additional testing to validate this and provide further updates accordingly. > > > + continue; > > + > > err = add_weight_node(nid, wi_kobj); > > if (err) { > > pr_err("failed to add sysfs [node%d]\n", nid); > > - break; > > + goto err_out; > > } > > } > > - if (err) > > - kobject_put(wi_kobj); > > + > > + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI); > > return 0; > > + > > +err_out: > > + kobject_put(wi_kobj); > > + return err; > > } > > > > static void mempolicy_kobj_release(struct kobject *kobj) > > -- > > 2.34.1 > >