On Tue 13-12-22 14:30:57, Huang, Ying wrote: > Mina Almasry <almasrymina@xxxxxxxxxx> writes: [...] > After these discussion, I think the solution maybe use different > interfaces for "proactive demote" and "proactive reclaim". That is, > reconsider "memory.demote". In this way, we will always uncharge the > cgroup for "memory.reclaim". This avoid the possible confusion there. > And, because demotion is considered aging, we don't need to disable > demotion for "memory.reclaim", just don't count it. As already pointed out in my previous email, we should really think more about future requirements. Do we add memory.promote interface when there is a request to implement numa balancing into the userspace? Maybe yes but maybe the node balancing should be more generic than bound to memory tiering and apply to a more fine grained nodemask control. Fundamentally we already have APIs to age (MADV_COLD, MADV_FREE), reclaim (MADV_PAGEOUT, MADV_DONTNEED) and MADV_WILLNEED to prioritize (swap in, or read ahead) which are per mm/file. Their primary usability issue is that they are process centric and that requires a very deep understanding of the process mm layout so it is not really usable for a larger scale orchestration. The important part of those interfaces is that they do not talk about demotion because that is an implementation detail. I think we want to follow that model at least. From a higher level POV I believe we really need an interface to age&reclaim and balance memory among nodes. Are there more higher level usecases? -- Michal Hocko SUSE Labs