On Thu, Oct 15, 2020 at 11:43 AM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > On Thu, Oct 15, 2020 at 11:20:30AM +0200, Michal Hocko wrote: > > > > > I do have a vague recollection that we have discussed a kill(2) based > > > > approach as well in the past. Essentially SIG_KILL_SYNC which would > > > > not only send the signal but it would start a teardown of resources > > > > owned by the task - at least those we can remove safely. The interface > > > > would be much more simple and less tricky to use. You just make your > > > > userspace oom killer or potentially other users call SIG_KILL_SYNC which > > > > will be more expensive but you would at least know that as many > > > > resources have been freed as the kernel can afford at the moment. > > > > > > Correct, my early RFC here > > > https://patchwork.kernel.org/project/linux-mm/patch/20190411014353.113252-3-surenb@xxxxxxxxxx > > > was using a new flag for pidfd_send_signal() to request mm reaping by > > > oom-reaper kthread. IIUC you propose to have a new SIG_KILL_SYNC > > > signal instead of a new pidfd_send_signal() flag and otherwise a very > > > similar solution. Is my understanding correct? > > > > Well, I think you shouldn't focus too much on the oom-reaper aspect > > of it. Sure it can be used for that but I believe that a new signal > > should provide a sync behavior. People more familiar with the process > > management would be better off defining what is possible for a new sync > > signal. Ideally not only pro-active process destruction but also sync > > waiting until the target process is released so that you know that once > > kill syscall returns the process is gone. > > If we approach with signal, I am not sure we need to create new signal > rather than pidfd and fsync(2) semantic. > > Furthermore, process_madvise makes the work in the caller context but > signal might work somewhere else context depending on implemenation( > oom reaper or CPU resumed the task). I am not sure it it fulfils Suren's > requirement. > > One more thing to think over: Even though we spent some overhead to > read /proc/pid/maps, we could make zapping in parallel in userspace > with multi thread approach. I am not sure what's the win since Suren > also care about zapping performance. Sorry Minchan, I did not see your reply while replying to Michal... Even if we do the reading/reaping in parallel, we still have to issue 10s of read() syscalls to consume the entire /proc/pid/maps file. Plus I'm not sure how much mmap_sem contention such parallel operation (reaping taking write lock and maps reading taking read lock) would generate. If we go this route I think a syscall to read a vector of VMAs would be way more performant and userspace usage would be much simpler.