Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

Michal Hocko <mhocko@xxxxxxxxxx> · Wed, 17 May 2017 11:20:43 +0200

On Sun 30-04-17 16:33:10, Cristopher Lameter wrote:
> On Wed, 26 Apr 2017, Vlastimil Babka wrote:
> 
> > > Such an application typically already has such logic and executes a
> > > binding after discovering its numa node configuration on startup. It would
> > > have to be modified to redo that action when it gets some sort of a signal
> > > from the script telling it that the node config would be changed.
> > >
> > > Having this logic in the application instead of the kernel avoids all the
> > > kernel messes that we keep on trying to deal with and IMHO is much
> > > cleaner.
> >
> > That would be much simpler for us indeed. But we still IMHO can't
> > abruptly start denying page fault allocations for existing applications
> > that don't have the necessary awareness.
> 
> We certainly can do that. The failure of the page faults are due to the
> admin trying to move an application that is not aware of this and is using
> mempols. That could be an error. Trying to move an application that
> contains both absolute and relative node numbers is definitely something
> that is potentiall so screwed up that the kernel should not muck around
> with such an app.
> 
> Also user space can determine if the application is using memory policies
> and can then take appropriate measures (message to the sysadmin to eval
> tge situation f.e.) or mess aroud with the processes memory policies on
> its own.
> 
> So this is certainly a way out of this mess.

So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy
case in a raceless way?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html