Re: Propagating GFP_NOFS inside __vmalloc()

"Ricardo M. Correia" <ricardo.correia@xxxxxxxxxx> · Tue, 16 Nov 2010 00:30:57 +0100

On Mon, 2010-11-15 at 14:50 -0800, David Rientjes wrote:
> Instead of extending the __*() functions with 
> more underscores like other places in the kernel (see mm/slab.c, for 
> instance), I'd suggest just appending _gfp() to their name so 
> __pmd_alloc() uses a new __pmd_alloc_gfp().

Sounds good to me.

> > For our case, I'd think it's better to either handle failure or somehow
> > retry until the allocation succeeds (if we know for sure that it will,
> > eventually).
> > 
> 
> If your use-case is going to block until this memory is available, there's 
> a serious problem that you'll need to address because nothing is going to 
> guarantee that memory will be freed unless something else is trying to 
> allocate memory and pages get written back or something gets killed as a 
> result.

In our use case, this code is only used on servers that are used for
serving a Lustre filesystem and nothing else, so we don't have to worry
about things like run-away memory hogs / user applications.

Currently we do block until this memory is available. I'd rather not go
much into this, but the amount of memory that can be allocated by this
method at any point in time is huge but it's bounded.

Also, we have a slab reclaim callback that signals a dedicated thread,
which asynchronously frees memory (it would free synchronously if
possible, but unfortunately it's not).

This thread is able to potentially free GBs of memory if necessary, and
therefore allow the vmalloc allocations in the I/O path to succeed
eventually. We know this because we limit the amount of memory that can
be allocated and nothing else can use a significant amount of memory on
our systems.

I know this is not how you'd typically do this, but we also have other
constraints (which again, I'd rather not go into) which makes this our
preferred solution.

>   Strictly relying on that behavior is concerning, but it's not 
> something that can be fixed in the VM.
>
> > Not sure what do you mean by this.. I don't see a typical vmalloc()
> > using __GFP_REPEAT anywhere (apart from functions such as
> > pmd_alloc_one(), which in the code above you suggested to keep passing
> > __GFP_REPEAT).. am I missing something?
> > 
> 
> __GFP_REPEAT will retry the allocation indefinitely until the needed 
> amount of memory is reclaimed without considering the order of the 
> allocation; all orders of interest in your case are order-0, so it will 
> loop indefinitely until a single page is reclaimed which won't happen with 
> GFP_NOFS.  Thus, passing the flag is the equivalent of asking the 
> allocator to loop forever until memory is available rather than failing 
> and returning to your error handling.

When you say loop forever, you don't mean in a busy loop, right?
Assuming we sleep in this loop (which AFAICS it does), then it's OK for
us because memory will be freed asynchronously.

If it didn't sleep then it'd be more concerning because all CPUs could
enter this loop and we'd deadlock..

Anyway, I will try the approach that you suggested and send out a new
patch. 

Thanks!

- Ricardo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>