Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 12 Dec 2013 12:07:54 +1100

On Tue, Dec 10, 2013 at 03:39:09PM -0800, Andrew Morton wrote:
> On Tue, 10 Dec 2013 15:20:17 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:
> 
> > On Mon, 9 Dec 2013, Andrew Morton wrote:
> > 
> > > > __GFP_NOFAIL specifies that the page allocator cannot fail to return
> > > > memory.  Allocators that call it may not even check for NULL upon
> > > > returning.
> > > > 
> > > > It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
> > > > actually return NULL.  More interestingly, processes that are doing
> > > > direct reclaim and have PF_MEMALLOC set may also return NULL for any
> > > > __GFP_NOFAIL allocation.
> > > 
> > > __GFP_NOFAIL is a nasty thing and making it pretend to work even better
> > > is heading in the wrong direction, surely?  It would be saner to just
> > > disallow these even-sillier combinations.  Can we fix up the current
> > > callers then stick a WARN_ON() in there?
> > > 
> > 
> > Heh, it's difficult to remove __GFP_NOFAIL when new users get added: 
> > 84235de394d9 ("fs: buffer: move allocation failure loop into the 
> > allocator") added a new user
> 
> That wasn't reeeeealy a new user - it was "convert an existing
> open-coded retry-for-ever loop".  Which is what __GFP_NOFAIL is for.
> 
> I don't think I've ever seen anyone actually fix one of these things
> (by teaching the caller to handle ENOMEM), so it obviously isn't
> working...

Right, because most of the loops are deep within filesystem
transaction code where the only thing to do with a memory allocation
failure is to abort the transaction, shutdown the filesystem and
deny user access (i.e. DOS the system) because the filesystem is
inconsistent in memory and the only way it can be recovered is
toosing everything in memory away and recovering the last valid
on disk state from the journal. i.e. umount, mount.

IOWs, the "fix" is far worse than current behaviour and so there is
absolutely no motivation for the people who own these __GFP_NOFAIL
allocations to fix them. Indeed, when you consider that the amount of
work to fix the filesystems to robustly handle ENOMEM is a *massive*
undertaking that adds significant overhead and complexity to each
filesystem, the cost/benefit analysis comes down so far on the side
of "just use __GFP_NOFAIL" that doing anything else is sheer lunacy.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>