On Tue 21-05-19 11:41:07, Minchan Kim wrote: > On Mon, May 20, 2019 at 11:18:29AM +0200, Michal Hocko wrote: > > [Cc linux-api] > > > > On Mon 20-05-19 12:52:52, Minchan Kim wrote: > > > There is some usecase that centralized userspace daemon want to give > > > a memory hint like MADV_[COOL|COLD] to other process. Android's > > > ActivityManagerService is one of them. > > > > > > It's similar in spirit to madvise(MADV_WONTNEED), but the information > > > required to make the reclaim decision is not known to the app. Instead, > > > it is known to the centralized userspace daemon(ActivityManagerService), > > > and that daemon must be able to initiate reclaim on its own without > > > any app involvement. > > > > Could you expand some more about how this all works? How does the > > centralized daemon track respective ranges? How does it synchronize > > against parallel modification of the address space etc. > > Currently, we don't track each address ranges because we have two > policies at this moment: > > deactive file pages and reclaim anonymous pages of the app. > > Since the daemon has a ability to let background apps resume(IOW, process > will be run by the daemon) and both hints are non-disruptive stabilty point > of view, we are okay for the race. Fair enough but the API should consider future usecases where this might be a problem. So we should really think about those potential scenarios now. If we are ok with that, fine, but then we should be explicit and document it that way. Essentially say that any sort of synchronization is supposed to be done by monitor. This will make the API less usable but maybe that is enough. > > > To solve the issue, this patch introduces new syscall process_madvise(2) > > > which works based on pidfd so it could give a hint to the exeternal > > > process. > > > > > > int process_madvise(int pidfd, void *addr, size_t length, int advise); > > > > OK, this makes some sense from the API point of view. When we have > > discussed that at LSFMM I was contemplating about something like that > > except the fd would be a VMA fd rather than the process. We could extend > > and reuse /proc/<pid>/map_files interface which doesn't support the > > anonymous memory right now. > > > > I am not saying this would be a better interface but I wanted to mention > > it here for a further discussion. One slight advantage would be that > > you know the exact object that you are operating on because you have a > > fd for the VMA and we would have a more straightforward way to reject > > operation if the underlying object has changed (e.g. unmapped and reused > > for a different mapping). > > I agree your point. If I didn't miss something, such kinds of vma level > modify notification doesn't work even file mapped vma at this moment. > For anonymous vma, I think we could use userfaultfd, pontentially. > It would be great if someone want to do with disruptive hints like > MADV_DONTNEED. > > I'd like to see it further enhancement after landing address range based > operation via limiting hints process_madvise supports to non-disruptive > only(e.g., MADV_[COOL|COLD]) so that we could catch up the usercase/workload > when someone want to extend the API. So do you think we want both interfaces (process_madvise and madvisefd)? -- Michal Hocko SUSE Labs