>On 05.04.23 21:42, Dan Williams wrote: >> Matthew Wilcox wrote: >>> On Tue, Apr 04, 2023 at 09:48:41PM -0700, Dan Williams wrote: >>>> Kyungsan Kim wrote: >>>>> We know the situation. When a CXL DRAM channel is located under ZONE_NORMAL, >>>>> a random allocation of a kernel object by calling kmalloc() siblings makes the entire CXL DRAM unremovable. >>>>> Also, not all kernel objects can be allocated from ZONE_MOVABLE. >>>>> >>>>> ZONE_EXMEM does not confine a movability attribute(movable or unmovable), rather it allows a calling context can decide it. >>>>> In that aspect, it is the same with ZONE_NORMAL but ZONE_EXMEM works for extended memory device. >>>>> It does not mean ZONE_EXMEM support both movability and kernel object allocation at the same time. >>>>> In case multiple CXL DRAM channels are connected, we think a memory consumer possibly dedicate a channel for movable or unmovable purpose. >>>>> >>>> >>>> I want to clarify that I expect the number of people doing physical CXL >>>> hotplug of whole devices to be small compared to dynamic capacity >>>> devices (DCD). DCD is a new feature of the CXL 3.0 specification where a >>>> device maps 1 or more thinly provisioned memory regions that have >>>> individual extents get populated and depopulated by a fabric manager. >>>> >>>> In that scenario there is a semantic where the fabric manager hands out >>>> 100G to a host and asks for it back, it is within the protocol that the >>>> host can say "I can give 97GB back now, come back and ask again if you >>>> need that last 3GB". >>> >>> Presumably it can't give back arbitrary chunks of that 100GB? There's >>> some granularity that's preferred; maybe on 1GB boundaries or something? >> >> The device picks a granularity that can be tiny per spec, but it makes >> the hardware more expensive to track in small extents, so I expect >> something reasonable like 1GB, but time will tell once actual devices >> start showing up. > >It all sounds a lot like virtio-mem using real hardware [I know, there >are important differences, but for the dynamic aspect there are very >similar issues to solve] > >Fir virtio-mem, the current best way to support hotplugging of large >memory to a VM to eventually be able to unplug a big fraction again is >using a combination of ZONE_MOVABLE and ZONE_NORMAL -- "auto-movable" >memory onlining policy. What's online to ZONE_MOVABLE can get (fairly) >reliably unplugged again. What's onlined to ZONE_NORMAL is possibly lost >forever. > >Like (incrementally) hotplugging 1 TiB to a 4 GiB VM. Being able to >unplug 1 TiB reliably again is pretty much out of scope. But the more >memory we can reliably get back the better. And the more memory we can >get in the common case, the better. With a ZONE_NORMAL vs. ZONE_MOVABLE >ration of 1:3 on could unplug ~768 GiB again reliably. The remainder >depends on fragmentation on the actual system and the unplug granularity. > >The original plan was to use ZONE_PREFER_MOVABLE as a safety buffer to >reduce ZONE_NORMAL memory without increasing ZONE_MOVABLE memory (and >possibly harming the system). The underlying idea was that in many >setups that memory in ZONE_PREFER_MOVABLE would not get used for >unmovable allocations and it could, therefore, get unplugged fairly >reliably in these setups. For all other setups, unmmovable allocations >could leak into ZONE_PREFER_MOVABLE and reduce the number of memory we >could unplug again. But the system would try to keep unmovable >allocations to ZONE_NORMAL, so in most cases with some >ZONE_PREFER_MOVABLE memory we would perform better than with only >ZONE_NORMAL. Probably memory hotplug mechanism would be separated into two stages, physical memory add/remove and logical memory on/offline[1]. We think ZONE_PREFER_MOVABLE could help logical memory on/offline. But, there would be trade-off between physical add/remove and device utilization. In case of ZONE_PREFER_MOVABLE allocation on switched CXL DRAM devices, when pages are evenly allocated among physical CXL DRAM devices, then it would not help physical memory add/remove. Meanwhile, when page are sequentially allocated among physical CXL DRAM devices, it would be opposite. ZONE_EXMEM provides provision of CXL DRAM devices[2], we think the idea of ZONE_PREFER_MOVABLE idea can be applied on that. For example, preferred movable page per CXL DRAM device within the zone. [1] https://docs.kernel.org/admin-guide/mm/memory-hotplug.html#phases-of-memory-hotplug [2] https://github.com/OpenMPDK/SMDK/wiki/2.-SMDK-Architecture#memory-partition > >-- >Thanks, > >David / dhildenb