On Thu, Mar 26, 2015 at 02:31:11PM +0100, Michal Hocko wrote: > On Wed 25-03-15 02:17:10, Johannes Weiner wrote: > > The zonelist locking and the oom_sem are two overlapping locks that > > are used to serialize global OOM killing against different things. > > > > The historical zonelist locking serializes OOM kills from allocations > > with overlapping zonelists against each other to prevent killing more > > tasks than necessary in the same memory domain. Only when neither > > tasklists nor zonelists from two concurrent OOM kills overlap (tasks > > in separate memcgs bound to separate nodes) are OOM kills allowed to > > execute in parallel. > > > > The younger oom_sem is a read-write lock to serialize OOM killing > > against the PM code trying to disable the OOM killer altogether. > > > > However, the OOM killer is a fairly cold error path, there is really > > no reason to optimize for highly performant and concurrent OOM kills. > > And the oom_sem is just flat-out redundant. > > > > Replace both locking schemes with a single global mutex serializing > > OOM kills regardless of context. > > OK, this is much simpler. > > You have missed drivers/tty/sysrq.c which should take the lock as well. > ZONE_OOM_LOCKED can be removed as well. __out_of_memory in the kerneldoc > should be renamed. Argh, an older version had the lock inside out_of_memory() and I never updated the caller when I changed the rules. Thanks. I'll fix both. > > @@ -795,27 +728,21 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > > */ > > void pagefault_out_of_memory(void) > > { > > - struct zonelist *zonelist; > > - > > - down_read(&oom_sem); > > if (mem_cgroup_oom_synchronize(true)) > > - goto unlock; > > + return; > > OK, so we are back to what David has asked previously. We do not need > the lock for memcg and oom_killer_disabled because we know that no tasks > (except for potential oom victim) are lurking around at the time > oom_killer_disable() is called. So I guess we want to stick a comment > into mem_cgroup_oom_synchronize before we check for oom_killer_disabled. I would prefer everybody that sets TIF_MEMDIE and kills a task to hold the lock, including memcg. Simplicity is one thing, but also a global OOM kill might not even be necessary when it's racing with the memcg. > After those are fixed, feel free to add > Acked-by: Michal Hocko <mhocko@xxxxxxx> Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html