On Thu, Feb 27, 2025 at 11:32:26AM +0900, Honggyu Kim wrote: > > But using N_MEMORY doesn't fix this problem and it hides the entire CXL > memory nodes in our system because the CXL memory isn't detected at this > point of creating node*. Maybe there is some difference when multiple > CXL memory is detected as a single node. > Hm, well, the node is "created" during early boot when ACPI tables are read and the CFMW are discovered - but they aren't necessarily "online" at the time they're created. There is no true concept of a "Hotplug NUMA Node" - as the node must be created at boot time. (tl;dr: N_POSSIBLE will never change). This patch may have been a bit overzealous of us, I forgot to ask whether N_MEMORY is set for nodes created but not onlined at boot. So this is a good observation. It also doesn't help that this may introduce a subtle race condition. If a node exists (N_POSSIBLE) but hasn't been onlined (!N_MEMORY) and bandwidth information is reported - then we store the bandwidth info but don't include the node in the reduction. Then if the node comes online later, we don't re-trigger reduction. Joshua we should just drop this patch for now and work with Honggyu and friends separately on this issue. In the meantime we can stick with N_POSSIBLE. There are more problems in this space - namely how to handle a system whereby 8 CXL nodes are "possible" but the user only configures 2 (as described by Hyonggye here). We will probably need to introduce hotplug/node on/offline callbacks to re-configure weights. ~Gregory