On Mon 27-09-21 12:12:46, Nadav Amit wrote: > > > On Sep 27, 2021, at 5:16 AM, Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Mon 27-09-21 05:00:11, Nadav Amit wrote: > > [...] > >> The manager is notified on memory regions that it should monitor > >> (through PTRACE/LD_PRELOAD/explicit-API). It then monitors these regions > >> using the remote-userfaultfd that you saw on the second thread. When it wants > >> to reclaim (anonymous) memory, it: > >> > >> 1. Uses UFFD-WP to protect that memory (and for this matter I got a vectored > >> UFFD-WP to do so efficiently, a patch which I did not send yet). > >> 2. Calls process_vm_readv() to read that memory of that process. > >> 3. Write it back to “swap”. > >> 4. Calls process_madvise(MADV_DONTNEED) to zap it. > > > > Why cannot you use MADV_PAGEOUT/MADV_COLD for this usecase? > > Providing hints to the kernel takes you so far to a certain extent. > The kernel does not want to (for a good reason) to be completely > configurable when it comes to reclaim and prefetch policies. Doing > so from userspace allows you to be fully configurable. I am sorry but I do not follow. Your scenario is describing a user space driven reclaim. Something that MADV_{COLD,PAGEOUT} have been designed for. What are you missing in the existing functionality? > > MADV_DONTNEED on a remote process has been proposed in the past several > > times and it has always been rejected because it is a free ticket to all > > sorts of hard to debug problems as it is just a free ticket for a remote > > memory corruption. An additional capability requirement might reduce the > > risk to some degree but I still do not think this is a good idea. > > I would argue that there is nothing bad that remote MADV_DONTNEED can do > that process_vm_writev() cannot do as well (putting aside ptrace). I am not arguing this would be the first syscall to allow tricky and hard to debug corruptions if used without care. > process_vm_writev() is checking: > > mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS) > > Wouldn't adding such a condition suffice? This would be a minimum requirement. Another one is a sensible usecase that is not covered by an existing functionality. -- Michal Hocko SUSE Labs