On Thu, Apr 7, 2022 at 2:26 PM Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2022-04-06 at 10:49 +0800, Huang, Ying wrote: > > > > > > If so, > > > > > > > > # echo A > memory.reclaim > > > > > > > > means > > > > > > > > a) "A" bytes memory are freed from the memcg, regardless demoting is > > > > used or not. > > > > > > > > or > > > > > > > > b) "A" bytes memory are reclaimed from the memcg, some of them may be > > > > freed, some of them may be just demoted from DRAM to PMEM. The total > > > > number is "A". > > > > > > > > For me, a) looks more reasonable. > > > > > > > > > > We can use a DEMOTE flag to control the demotion behavior for > > > memory.reclaim. If the flag is not set (the default), then > > > no_demotion of scan_control can be set to 1, similar to > > > reclaim_pages(). > > > > If we have to use a flag to control the behavior, I think it's better to > > have a separate interface (e.g. memory.demote). But do we really need b)? > > > > > The question is then whether we want to rename memory.reclaim to > > > something more general. I think this name is fine if reclaim-based > > > demotion is an accepted concept. > > > > memory.demote will work for 2 level of memory tiers. But when we have 3 level > of memory (e.g. high bandwidth memory, DRAM and PMEM), > it gets ambiguous again of wheter we sould demote from high bandwidth memory > or DRAM. > > Will something like this be more general? > > echo X > memory_[dram,pmem,hbm].reclaim > > So echo X > memory_dram.reclaim > means that we want to free up X bytes from DRAM for the mem cgroup. > > echo demote > memory_dram.reclaim_policy > > This means that we prefer demotion for reclaim instead > of swapping to disk. > (resending in plain-text, sorry). memory.demote can work with any level of memory tiers if a nodemask argument (or a tier argument if there is a more-explicitly defined, userspace visible tiering representation) is provided. The semantics can be to demote X bytes from these nodes to their next tier. memory_dram/memory_pmem assumes the hardware for a particular memory tier, which is undesirable. For example, it is entirely possible that a slow memory tier is implemented by a lower-cost/lower-performance DDR device connected via CXL.mem, not by PMEM. It is better for this interface to speak in either the NUMA node abstraction or a new tier abstraction. It is also desirable to make this interface stateless, i.e. not to require the setting of memory_dram.reclaim_policy. Any policy can be specified as arguments to the request itself and should only affect that particular request. Wei