Re: [RFC] vmalloc: add warning in __vmalloc

David Rientjes <rientjes@xxxxxxxxxx> · Tue, 1 May 2012 13:22:57 -0700 (PDT)

On Tue, 1 May 2012, Nick Piggin wrote:

> > I disagree with this approach since it's going to violently spam an
> > innocent kernel user's log with no ratelimiting and for a situation that
> > actually may not be problematic.
> 
> With WARN_ON_ONCE, it should be good.
> 

To catch a single instance of this per-boot, sure.  I've never seen us add 
WARN_ON_ONCE()'s where we have concrete examples of kernel code that will 
trigger it, though.  Not sure why spamming the kernel log and getting 
users to think something is wrong and report the bug when it's possible to 
audit the code and make a report to the subsystem maintainer.  Perhaps 
adding WARN_ON_ONCE()'s is just easier and then walk away from it?

> > Passing any of these bits (the difference between GFP_KERNEL and
> > GFP_ATOMIC) only means anything when we're going to do reclaim.  And I'm
> > suspecting we would have seen problems with this already since
> > pte_alloc_kernel() does __GFP_REPEAT on most architectures meaning that it
> > will loop infinitely in the page allocator until at least one page is
> > freed (since its an order-0 allocation) which would hardly ever happen if
> > __GFP_FS or __GFP_IO actually meant something in this context.
> >
> > In other words, we would already have seen these deadlocks and it would
> > have been diagnosed as a vmalloc(GFP_ATOMIC) problem.  Where are those bug
> > reports?
> 
> That's not sound logic to disprove a bug.
> 
> I think simply most callers are permissive and don't mask out flags.
> But for example a filesystem holding an fs lock and then doing
> vmalloc(GFP_NOFS) can certainly deadlock.
> 

I'm not disproving a bug, I'm asking for an example of how this problem 
has caused pain before and it has been the result of calling 
vmalloc(GFP_NOFS).  I agree we should certainly fix those callers, but it 
seems like adding the WARN_ON_ONCE()'s is certainly going to cause pain in 
tons of bug reports where there's no actual problem that couldn't have 
been found by auditing the code.