On Thu 06-09-18 10:00:00, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 05-09-18 22:53:33, Tetsuo Handa wrote: > > > On 2018/09/05 22:40, Michal Hocko wrote: > > > > Changelog said > > > > > > > > "Although this is possible in principle let's wait for it to actually > > > > happen in real life before we make the locking more complex again." > > > > > > > > So what is the real life workload that hits it? The log you have pasted > > > > below doesn't tell much. > > > > > > Nothing special. I just ran a multi-threaded memory eater on a CONFIG_PREEMPT=y kernel. > > > > I strongly suspec that your test doesn't really represent or simulate > > any real and useful workload. Sure it triggers a rare race and we kill > > another oom victim. Does this warrant to make the code more complex? > > Well, I am not convinced, as I've said countless times. > > Yes. Below is an example from a machine running Apache Web server/Tomcat AP server/PostgreSQL DB server. > An memory eater needlessly killed Tomcat due to this race. What prevents you from modifying you mem eater in a way that Tomcat resp. others from being the primary oom victim choice? In other words, yeah it is not optimal to lose the race but if it is rare enough then this is something to live with because it can be hardly considered a new DoS vector AFAICS. Remember that this is always going to be racy land and we are not going to plumb all possible races because this is simply not viable. But I am pretty sure we have been through all this many times already. Oh well... > I assert that we should fix af5679fbc669f31f. If you can come up with reasonable patch which doesn't complicate the code and it is a clear win for both this particular workload as well as others then why not. -- Michal Hocko SUSE Labs