On Thu, Apr 11, 2019 at 10:36 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Thu, Apr 11, 2019 at 10:33:32AM -0700, Daniel Colascione wrote: > > On Thu, Apr 11, 2019 at 10:09 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > On Thu, Apr 11, 2019 at 8:33 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Apr 10, 2019 at 06:43:53PM -0700, Suren Baghdasaryan wrote: > > > > > Add new SS_EXPEDITE flag to be used when sending SIGKILL via > > > > > pidfd_send_signal() syscall to allow expedited memory reclaim of the > > > > > victim process. The usage of this flag is currently limited to SIGKILL > > > > > signal and only to privileged users. > > > > > > > > What is the downside of doing expedited memory reclaim? ie why not do it > > > > every time a process is going to die? > > > > > > I think with an implementation that does not use/abuse oom-reaper > > > thread this could be done for any kill. As I mentioned oom-reaper is a > > > limited resource which has access to memory reserves and should not be > > > abused in the way I do in this reference implementation. > > > While there might be downsides that I don't know of, I'm not sure it's > > > required to hurry every kill's memory reclaim. I think there are cases > > > when resource deallocation is critical, for example when we kill to > > > relieve resource shortage and there are kills when reclaim speed is > > > not essential. It would be great if we can identify urgent cases > > > without userspace hints, so I'm open to suggestions that do not > > > involve additional flags. > > > > I was imagining a PI-ish approach where we'd reap in case an RT > > process was waiting on the death of some other process. I'd still > > prefer the API I proposed in the other message because it gets the > > kernel out of the business of deciding what the right signal is. I'm a > > huge believer in "mechanism, not policy". > > It's not a question of the kernel deciding what the right signal is. > The kernel knows whether a signal is fatal to a particular process or not. > The question is whether the killing process should do the work of reaping > the dying process's resources sometimes, always or never. Currently, > that is never (the process reaps its own resources); Suren is suggesting > sometimes, and I'm asking "Why not always?" FWIW, Suren's initial proposal is that the oom_reaper kthread do the reaping, not the process sending the kill. Are you suggesting that sending SIGKILL should spend a while in signal delivery reaping pages before returning? I thought about just doing it this way, but I didn't like the idea: it'd slow down mass-killing programs like killall(1). Programs expect sending SIGKILL to be a fast operation that returns immediately.