Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail

Nicholas Mc Guire <der.herr@xxxxxxx> · Mon, 24 Dec 2018 10:38:04 +0100

On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > 
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 871e41c..1c118d7 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > >  
> > > >  	/* Import existing vmlist entries. */
> > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > >  		va->flags = VM_VM_AREA;
> > > >  		va->va_start = (unsigned long)tmp->addr;
> > > >  		va->va_end = va->va_start + tmp->size;
> > > 
> > > Hi Nicholas,
> > > 
> > > You're right that this looks wrong because there's no guarantee that va is 
> > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > we're not giving the page allocator a chance to reclaim so this would 
> > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > dereference, which would actually be the better result.
> > >
> > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > which also should point cleanly to the cause - but I´m actually not
> > that sure if that trace was correct in all cases.
> 
> No, we do not trigger the memory reclaim path nor the oom killer when
> using GFP_NOWAIT. In fact the current implementation even ignores
> __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> patch simply doesn't have any effect. But the primary objection is that
> the behavior might change in future and you certainly do not want to get
> stuck in the boot process without knowing what is going on. Crashing
> will tell you that quite obviously. Although I have hard time imagine
> how that could happen in a reasonably configured system.

I think most of the defensive structures are covering rare to almost
impossible cases - but those are precisely the hard ones to understand if
they do happen.

> 
> > > You could do
> > > 
> > > 	BUG_ON(!va);
> > > 
> > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > crash is intentional rather than some error in the kernel code.
> > 
> > makes sense - that atleast makes it imediately clear from the code
> > that there is no way out from here.
> 
> How does it differ from blowing up right there when dereferencing flags?
> It would be clear from the oops.

The question is how soon does it blow-up if it were imediate then three is
probably no real difference if there is some delay say due to the region
affected by the NULL pointer not being imediately in use - it may be very
hard to differenciate between an allocation failure and memory corruption
so having a directly associated trace should be significantly simpler to
understand - and you might actually not want a system to try booting if there
are problems at this level.

thx!
hofrat