Re: [PATCH 2/2 v6] mm/mempolicy: Don't create weight sysfs for memoryless nodes

Honggyu Kim <honggyu.kim@xxxxxx> · Tue, 4 Mar 2025 22:03:22 +0900

Hi Gregory,

On 3/4/2025 1:19 AM, Gregory Price wrote:
On Thu, Feb 27, 2025 at 11:32:26AM +0900, Honggyu Kim wrote:

But using N_MEMORY doesn't fix this problem and it hides the entire CXL
memory nodes in our system because the CXL memory isn't detected at this
point of creating node*.  Maybe there is some difference when multiple
CXL memory is detected as a single node.

Hm, well, the node is "created" during early boot when ACPI tables are
read and the CFMW are discovered - but they aren't necessarily "online"
at the time they're created.

There is no true concept of a "Hotplug NUMA Node" - as the node must be
created at boot time. (tl;dr: N_POSSIBLE will never change).

This patch may have been a bit overzealous of us, I forgot to ask
whether N_MEMORY is set for nodes created but not onlined at boot. So
this is a good observation.

I didn't want to make more noise but we found many issues again after
getting a new machine and started using it with multiple CXL memory.

It also doesn't help that this may introduce a subtle race condition.

If a node exists (N_POSSIBLE) but hasn't been onlined (!N_MEMORY) and
bandwidth information is reported - then we store the bandwidth info
but don't include the node in the reduction.  Then if the node comes
online later, we don't re-trigger reduction.

Joshua we should just drop this patch for now and work with Honggyu and
friends separately on this issue.  In the meantime we can stick with
N_POSSIBLE.

There are more problems in this space - namely how to handle a system
whereby 8 CXL nodes are "possible" but the user only configures 2 (as
described by Hyonggye here).  We will probably need to introduce
hotplug/node on/offline callbacks to re-configure weights.

~Gregory

This work won't take a long time so I think we can submit a patch within 
a few days.

Thanks,
Honggyu