On Tue, Mar 04, 2025 at 10:03:22PM +0900, Honggyu Kim wrote: > Hi Gregory, > > > This patch may have been a bit overzealous of us, I forgot to ask > > whether N_MEMORY is set for nodes created but not onlined at boot. So > > this is a good observation. > > I didn't want to make more noise but we found many issues again after > getting a new machine and started using it with multiple CXL memory. > I spent yesterday looking into how nodes are created and marked N_MEMORY and I think now that this patch is just not correct. N_MEMORY for a given nid is toggled: 1) during mm_init if any page is associated with that node (DRAM) 2) memory_hotplug when a memory block is onlined/offlined (CXL) This means a CXL node which is deferred to the driver will come up as memoryless at boot (mm_init) but has N_MEMORY toggled on when the first hotplug memory block is onlined. However, its access_coordinate data is reported during cxl driver probe - well prior to memory hotplug. This means we must expose a node entry for every possible node, always, because we can't predict what nodes will have hotplug memory. We COULD try to react to hotplug memory blocks, but this increase in complexity just doesn't seem worth the hassle - the hotplug callback has timing restrictions (callback must occur AFTER N_MEMORY is toggled). It seems better to include all nodes with reported data in the reduction. This has two downsides: 1) stale data may be used if hotplug occurs and the new device does not have CDAT/HMAT/access_coordinate data. 2) any device without CDAT/HMAT/access_coordinate data will not be included in the reduction by default. I think we can work around #2 by detecting this (on reduction, if data is missing but N_MEMORY is set, fire a warning). We can't do much about #1 unless we field physical device hot-un-plug callbacks - and that seems like a bit much. ~Gregory