Kyungsan Kim wrote: > >On Fri, Mar 31, 2023 at 08:37:15PM +0900, Kyungsan Kim wrote: > >> >> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases. > >> > > >> >This sounds dangerously confused. Do you want the EXMEM to be removable > >> >or not? If you do, then allocations from it have to be movable. If > >> >you don't, why go to all this trouble? > >> > >> I'm sorry to make you confused. We will try more to clearly explain our thought. > >> We think the CXL DRAM device should be removable along with HW pluggable nature. > >> For MM point of view, we think a page of CXL DRAM can be both movable and unmovable. > >> An user or kernel context should be able to determine it. Thus, we think dedication on the ZONE_NORMAL or the ZONE_MOVABLE is not enough. > > > >No, this is not the right approach. If CXL is to be hot-pluggable, > >then all CXL allocations must be movable. If even one allocation on a > >device is not movable, then the device cannot be removed. ZONE_EXMEM > >feels like a solution in search of a problem > > We know the situation. When a CXL DRAM channel is located under ZONE_NORMAL, > a random allocation of a kernel object by calling kmalloc() siblings makes the entire CXL DRAM unremovable. > Also, not all kernel objects can be allocated from ZONE_MOVABLE. > > ZONE_EXMEM does not confine a movability attribute(movable or unmovable), rather it allows a calling context can decide it. > In that aspect, it is the same with ZONE_NORMAL but ZONE_EXMEM works for extended memory device. > It does not mean ZONE_EXMEM support both movability and kernel object allocation at the same time. > In case multiple CXL DRAM channels are connected, we think a memory consumer possibly dedicate a channel for movable or unmovable purpose. > I want to clarify that I expect the number of people doing physical CXL hotplug of whole devices to be small compared to dynamic capacity devices (DCD). DCD is a new feature of the CXL 3.0 specification where a device maps 1 or more thinly provisioned memory regions that have individual extents get populated and depopulated by a fabric manager. In that scenario there is a semantic where the fabric manager hands out 100G to a host and asks for it back, it is within the protocol that the host can say "I can give 97GB back now, come back and ask again if you need that last 3GB". In other words even pinned pages in ZONE_MOVABLE are not fatal to the flow. Alternatively, if a deployment needs 100% guarantees that the host will return all the memory it was assigned when asked there is always the option to keep that memory out of the page allocator and just access it via a device. That's the role device-dax plays for "dedicated" memory that needs to be set aside from kernel allocations. This is to say something like ZONE_PREFER_MOVABLE semantics can be handled within the DCD protocol, where 100% unpluggability is not necessary and 97% is good enough.