On Mon 12-12-22 16:54:27, Mina Almasry wrote: > On Mon, Dec 12, 2022 at 12:55 AM Michal Hocko <mhocko@xxxxxxxx> wrote: [...] > > Let me summarize my main concerns here as well. The proposed > > implementation doesn't apply the provided nodemask to the whole reclaim > > process. This means that demotion can happen outside of the mask so the > > the user request cannot really control demotion targets and that limits > > the interface should there be any need for a finer grained control in > > the future (see an example in [2]). > > Another problem is that this can limit future reclaim extensions because > > of existing assumptions of the interface [3] - specify only top-tier > > node to force the aging without actually reclaiming any charges and > > (ab)use the interface only for aging on multi-tier system. A change to > > the reclaim to not demote in some cases could break this usecase. > > > > I think this is correct. My use case is to request from the kernel to > do demotion without reclaim in the cgroup, and the reason for that is > stated in the commit message: > > "Reclaim and demotion incur different latency costs to the jobs in the > cgroup. Demoted memory would still be addressable by the userspace at > a higher latency, but reclaimed memory would need to incur a > pagefault." > > For jobs of some latency tiers, we would like to trigger proactive > demotion (which incurs relatively low latency on the job), but not > trigger proactive reclaim (which incurs a pagefault). I initially had > proposed a separate interface for this, but Johannes directed me to > this interface instead in [1]. In the same email Johannes also tells > me that meta's reclaim stack relies on memory.reclaim triggering > demotion, so it seems that I'm not the first to take a dependency on > this. Additionally in [2] Johannes also says it would be great if in > the long term reclaim policy and demotion policy do not diverge. I do recognize your need to control the demotion but I argue that it is a bad idea to rely on an implicit behavior of the memory reclaim and an interface which is _documented_ to primarily _reclaim_ memory. Really, consider that the current demotion implementation will change in the future and based on a newly added heuristic memory reclaim or compression would be preferred over migration to a different tier. This might completely break your current assumptions and break your usecase which relies on an implicit demotion behavior. Do you see that as a potential problem at all? What shall we do in that case? Special case memory.reclaim behavior? Now to your specific usecase. If there is a need to do a memory distribution balancing then fine but this should be a well defined interface. E.g. is there a need to not only control demotion but promotions as well? I haven't heard anybody requesting that so far but I can easily imagine that like outsourcing the memory reclaim to the userspace someone might want to do the same thing with the numa balancing because $REASONS. Should that ever happen, I am pretty sure hooking into memory.reclaim is not really a great idea. See where I am coming from? > [1] https://lore.kernel.org/linux-mm/Y35fw2JSAeAddONg@xxxxxxxxxxx/ > [2] https://lore.kernel.org/linux-mm/Y36fIGFCFKiocAd6@xxxxxxxxxxx/ -- Michal Hocko SUSE Labs