Michal Hocko <mhocko@xxxxxxxx> writes: > On Tue 13-12-22 14:30:57, Huang, Ying wrote: >> Mina Almasry <almasrymina@xxxxxxxxxx> writes: > [...] >> After these discussion, I think the solution maybe use different >> interfaces for "proactive demote" and "proactive reclaim". That is, >> reconsider "memory.demote". In this way, we will always uncharge the >> cgroup for "memory.reclaim". This avoid the possible confusion there. >> And, because demotion is considered aging, we don't need to disable >> demotion for "memory.reclaim", just don't count it. > > As already pointed out in my previous email, we should really think more > about future requirements. Do we add memory.promote interface when there > is a request to implement numa balancing into the userspace? Maybe yes > but maybe the node balancing should be more generic than bound to memory > tiering and apply to a more fine grained nodemask control. > > Fundamentally we already have APIs to age (MADV_COLD, MADV_FREE), > reclaim (MADV_PAGEOUT, MADV_DONTNEED) and MADV_WILLNEED to prioritize > (swap in, or read ahead) which are per mm/file. Their primary usability > issue is that they are process centric and that requires a very deep > understanding of the process mm layout so it is not really usable for a > larger scale orchestration. > The important part of those interfaces is that they do not talk about > demotion because that is an implementation detail. I think we want to > follow that model at least. From a higher level POV I believe we really > need an interface to age&reclaim and balance memory among nodes. Are > there more higher level usecases? Yes. If the high level interface can satisfy the requirements, we should use them or define them. But I guess Mina and Xu has some requirements at the level of memory tiers (demotion/promotion)? Best Regards, Huang, Ying