On Thu, Apr 11, 2019 at 10:11:51PM +0200, Michal Hocko wrote: > On Thu 11-04-19 15:14:30, Joel Fernandes wrote: > > On Thu, Apr 11, 2019 at 08:12:43PM +0200, Michal Hocko wrote: > > > On Thu 11-04-19 12:18:33, Joel Fernandes wrote: > > > > On Thu, Apr 11, 2019 at 6:51 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > > > On Wed 10-04-19 18:43:51, Suren Baghdasaryan wrote: > > > > > [...] > > > > > > Proposed solution uses existing oom-reaper thread to increase memory > > > > > > reclaim rate of a killed process and to make this rate more deterministic. > > > > > > By no means the proposed solution is considered the best and was chosen > > > > > > because it was simple to implement and allowed for test data collection. > > > > > > The downside of this solution is that it requires additional “expedite” > > > > > > hint for something which has to be fast in all cases. Would be great to > > > > > > find a way that does not require additional hints. > > > > > > > > > > I have to say I do not like this much. It is abusing an implementation > > > > > detail of the OOM implementation and makes it an official API. Also > > > > > there are some non trivial assumptions to be fullfilled to use the > > > > > current oom_reaper. First of all all the process groups that share the > > > > > address space have to be killed. How do you want to guarantee/implement > > > > > that with a simply kill to a thread/process group? > > > > > > > > Will task_will_free_mem() not bail out in such cases because of > > > > process_shares_mm() returning true? > > > > > > I am not really sure I understand your question. task_will_free_mem is > > > just a shortcut to not kill anything if the current process or a victim > > > is already dying and likely to free memory without killing or spamming > > > the log. My concern is that this patch allows to invoke the reaper > > > > Got it. > > > > > without guaranteeing the same. So it can only be an optimistic attempt > > > and then I am wondering how reasonable of an interface this really is. > > > Userspace send the signal and has no way to find out whether the async > > > reaping has been scheduled or not. > > > > Could you clarify more what you're asking to guarantee? I cannot picture it. > > If you mean guaranteeing that "a task is dying anyway and will free its > > memory on its own", we are calling task_will_free_mem() to check that before > > invoking the oom reaper. > > No, I am talking about the API aspect. Say you kall kill with the flag > to make the async address space tear down. Now you cannot really > guarantee that this is safe to do because the target task might > clone(CLONE_VM) at any time. So this will be known only once the signal > is sent, but the calling process has no way to find out. So the caller > has no way to know what is the actual result of the requested operation. > That is a poor API in my book. > > > Could you clarify what is the draback if OOM reaper is invoked in parallel to > > an exiting task which will free its memory soon? It looks like the OOM reaper > > is taking all the locks necessary (mmap_sem) in particular and is unmapping > > pages. It seemed to me to be safe, but I am missing what are the main draw > > backs of this - other than the intereference with core dump. One could be > > presumably scalability since the since OOM reaper could be bottlenecked by > > freeing memory on behalf of potentially several dying tasks. > > oom_reaper or any other kernel thread doing the same is a mere > implementation detail I think. The oom killer doesn't really need the > oom_reaper to act swiftly because it is there to act as a last resort if > the oom victim cannot terminate on its own. If you want to offer an > user space API then you can assume users will like to use it and expect > a certain behavior but what that is? E.g. what if there are thousands of > tasks killed this way? Do we care that some of them will not get the > async treatment? If yes why do we need an API to control that at all? > > Am I more clear now? Yes, your concerns are more clear now. We will think more about this and your other responses, thanks a lot. - Joel