Hi Suren, On Mon, Feb 10, 2020 at 09:50:20AM -0800, Suren Baghdasaryan wrote: > On Mon, Jan 27, 2020 at 4:17 PM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > > > There is usecase that System Management Software(SMS) want to give > > a memory hint like MADV_[COLD|PAGEEOUT] to other processes and > > in the case of Android, it is the ActivityManagerService. > > > > It's similar in spirit to madvise(MADV_WONTNEED), but the information > > required to make the reclaim decision is not known to the app. Instead, > > it is known to the centralized userspace daemon(ActivityManagerService), > > and that daemon must be able to initiate reclaim on its own without > > any app involvement. > > > > To solve the issue, this patch introduces a new syscall process_madvise(2). > > It uses pidfd of an external process to give the hint. > > > > int process_madvise(int pidfd, void *addr, size_t length, int advise, > > unsigned long flag); > > > > Since it could affect other process's address range, only privileged > > process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) > > gives it the right to ptrace the process could use it successfully. > > The flag argument is reserved for future use if we need to extend the > > API. > > > > I think supporting all hints madvise has/will supported/support to > > process_madvise is rather risky. Because we are not sure all hints make > > sense from external process and implementation for the hint may rely on > > the caller being in the current context so it could be error-prone. > > Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. > > > > If someone want to add other hints, we could hear hear the usecase and > > review it for each hint. It's safer for maintenance rather than > > introducing a buggy syscall but hard to fix it later. > > I would definitely be interested in adding MADV_DONTNEED support for > process_madvise() to allow quick memory reclaim after a kill. The > scenario is that userspace daemon can kill a process and try to help > reclaim its memory. Having process_madvise(MADV_DONTNEED) support > helps in the following cases: > 1. Process issuing process_madvise has a higher CPU bandwidth > allowance than the victim process, therefore can reclaim victim's > memory quicker. > 2. In case the victim occupies large amounts of memory the process > issuing process_madvise can spawn multiple (possibly high priority) > threads each reclaiming portions of the victim's memory. > Such an extension will add a destructive kind of madvise into the set > supported by process_madvise and I want to make sure we can accomodate > for that in the future. Do you see any issues with supporting > MADV_DONTNEED in the future? Or kernel could do by themselves to spawn mulitple tasks if the system has available badwidth and target process has a lot memory to be reclaimed Anyway, it doesn't have any issue because we already have some synchrnoization methods(e.g., signal or cgroup freezer) to freeze target processes before giving a hint. It's not different with usual write syscall on shared file among processes. < snip > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 0c901de531e4..00ffa7e92f79 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -17,6 +17,7 @@ > > #include <linux/falloc.h> > > #include <linux/fadvise.h> > > #include <linux/sched.h> > > +#include <linux/sched/mm.h> > > #include <linux/ksm.h> > > #include <linux/fs.h> > > #include <linux/file.h> > > @@ -315,6 +316,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > > > if (fatal_signal_pending(task)) > > return -EINTR; > > + else if (current != task && fatal_signal_pending(current)) > > + return -EINTR; > > I think this can be simplified as: > > + if (fatal_signal_pending(current)) > + return -EINTR; > > current != task condition is not needed because if current == task > then you would return earlier after checking > fatal_signal_pending(task). True, I will remove it. Thanks!