On Tue 31-03-20 12:01:17, Joel Fernandes wrote: > On Tue, Mar 31, 2020 at 05:34:50PM +0200, Michal Hocko wrote: > > On Tue 31-03-20 10:58:06, Joel Fernandes wrote: > > [...] > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 4be763355c9fb..965deefffdd58 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3149,7 +3149,7 @@ static inline struct rcu_head *attach_rcu_head_to_object(void *obj) > > > > > > > > if (!ptr) > > > > ptr = kmalloc(sizeof(unsigned long *) + > > > > - sizeof(struct rcu_head), GFP_ATOMIC | __GFP_NOWARN); > > > > + sizeof(struct rcu_head), GFP_MEMALLOC); > > > > > > Just to add, the main requirements here are: > > > 1. Allocation should be bounded in time. > > > 2. Allocation should try hard (possibly tapping into reserves) > > > 3. Sleeping is Ok but should not affect the time bound. > > > > > > __GFP_ATOMIC | __GFP_HIGH is the way to get an additional access to > > memory reserves regarless of the sleeping status. > > > > Using __GFP_MEMALLOC is quite dangerous because it can deplete _all_ the > > memory. What does prevent the above code path to do that? Neil has provided a nice explanation down the email thread. But let me clarify few things here. > Can you suggest what prevents other users of GFP_MEMALLOC from doing that > also? There is no explicit mechanism which is indeed unfortunate. The only user real user of the flag is Swap over NFS AFAIK. I have never dared to look into details on how the complete reserves depletion is prevented. Mel would be much better fit here. > That's the whole point of having a reserve, in normal usage no one will > use it, but some times you need to use it. Keep in mind this is not a common > case in this code here, this is triggered only if earlier allocation attempts > failed. Only *then* we try with GFP_MEMALLOC with promises to free additional > memory soon. You are right that this is the usecase for the flag. But this should be done with an extreme care because the core MM relies on those reserves so any other users should better make sure they do not consume a lot from reserves as well. > > If a partial access to reserves is sufficient then why the existing > > modifiers (mentioned above are not sufficient? > > The point with using GFP_MEMALLOC is it is useful for situations where you > are about to free memory and needed some memory temporarily, to free that. It > depletes it a bit temporarily to free even more. Is that not the point of > PF_MEMALLOC? > * %__GFP_MEMALLOC allows access to all memory. This should only be used when > * the caller guarantees the allocation will allow more memory to be freed > * very shortly e.g. process exiting or swapping. Users either should > * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). > > I was just recommending usage of this flag here because it fits the > requirement of allocating some memory to free some memory. I am also Ok with > GFP_ATOMIC with the GFP_NOWARN removed, if you are Ok with that. Maybe we need to refine this documentation to be more explicit about an extreme care to be taken when using the flag. diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e5b817cb86e7..e436a7e28392 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -110,6 +110,9 @@ struct vm_area_struct; * the caller guarantees the allocation will allow more memory to be freed * very shortly e.g. process exiting or swapping. Users either should * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). + * Users of this flag have to be extremely careful to not deplete the reserve + * completely and implement a throttling mechanism which controls the consumption + * based on the amount of freed memory. * * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. * This takes precedence over the %__GFP_MEMALLOC flag if both are set. -- Michal Hocko SUSE Labs