On 2/27/2025 11:32 AM, Honggyu Kim wrote:
Hi Joshua,
On 2/27/2025 6:35 AM, Joshua Hahn wrote:
We should never try to allocate memory from a memoryless node. Creating a
sysfs knob to control its weighted interleave weight does not make sense,
and can be unsafe.
Only create weighted interleave weight knobs for nodes with memory.
Signed-off-by: Joshua Hahn <joshua.hahnjy@xxxxxxxxx>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4cc04ff8f12c..50cbb7c047fa 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3721,7 +3721,7 @@ static int add_weighted_interleave_group(struct
kobject *root_kobj)
return err;
}
- for_each_node_state(nid, N_POSSIBLE) {
Actually, we're aware of this issue and currently trying to fix this.
In our system, we've attached 4ch of CXL memory for each socket as
follows.
node0 node1
+-------+ UPI +-------+
| CPU 0 |-+-----+-| CPU 1 |
+-------+ +-------+
| DRAM0 | | DRAM1 |
+---+---+ +---+---+
| |
+---+---+ +---+---+
| CXL 0 | | CXL 4 |
+---+---+ +---+---+
| CXL 1 | | CXL 5 |
+---+---+ +---+---+
| CXL 2 | | CXL 6 |
+---+---+ +---+---+
| CXL 3 | | CXL 7 |
+---+---+ +---+---+
node2 node3
The 4ch of CXL memory are detected as a single NUMA node in each socket,
but it shows as follows with the current N_POSSIBLE loop.
$ ls /sys/kernel/mm/mempolicy/weighted_interleave/
node0 node1 node2 node3 node4 node5
node6 node7 node8 node9 node10 node11
+ for_each_node_state(nid, N_MEMORY) {
Thinking it again, we can leave it as a separate patch but add our patch
on top of it.
The only concern I have is having only N_MEMORY patch hides weight
setting knobs for CXL memory and it makes there is no way to set weight
values to CXL memory in my system.
IMHO, this and our patch is better to be submitted together.
Thanks,
Honggyu
But using N_MEMORY doesn't fix this problem and it hides the entire CXL
memory nodes in our system because the CXL memory isn't detected at this
point of creating node*. Maybe there is some difference when multiple
CXL memory is detected as a single node.
We have to create more nodes when CXL memory is detected later. In
addition, this part can be changed to "for_each_online_node(nid)"
although N_MEMORY is also fine here.
We've internally fixed it using a memory hotpluging callback so we can
upload another working version later.
Do you mind if we continue fixing this work?
Thanks,
Honggyu
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);