On April 1, 2020 3:23:59 AM EDT, Michal Hocko <mhocko@xxxxxxxxxx> wrote: >On Tue 31-03-20 12:01:17, Joel Fernandes wrote: >> On Tue, Mar 31, 2020 at 05:34:50PM +0200, Michal Hocko wrote: >> > On Tue 31-03-20 10:58:06, Joel Fernandes wrote: >> > [...] >> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> > > > index 4be763355c9fb..965deefffdd58 100644 >> > > > --- a/kernel/rcu/tree.c >> > > > +++ b/kernel/rcu/tree.c >> > > > @@ -3149,7 +3149,7 @@ static inline struct rcu_head >*attach_rcu_head_to_object(void *obj) >> > > > >> > > > if (!ptr) >> > > > ptr = kmalloc(sizeof(unsigned long *) + >> > > > - sizeof(struct rcu_head), GFP_ATOMIC | __GFP_NOWARN); >> > > > + sizeof(struct rcu_head), GFP_MEMALLOC); >> > > >> > > Just to add, the main requirements here are: >> > > 1. Allocation should be bounded in time. >> > > 2. Allocation should try hard (possibly tapping into reserves) >> > > 3. Sleeping is Ok but should not affect the time bound. >> > >> > >> > __GFP_ATOMIC | __GFP_HIGH is the way to get an additional access to >> > memory reserves regarless of the sleeping status. >> > >> > Using __GFP_MEMALLOC is quite dangerous because it can deplete >_all_ the >> > memory. What does prevent the above code path to do that? > >Neil has provided a nice explanation down the email thread. But let me >clarify few things here. > >> Can you suggest what prevents other users of GFP_MEMALLOC from doing >that >> also? > >There is no explicit mechanism which is indeed unfortunate. The only >user real user of the flag is Swap over NFS AFAIK. I have never dared >to >look into details on how the complete reserves depletion is prevented. >Mel would be much better fit here. > >> That's the whole point of having a reserve, in normal usage no one >will >> use it, but some times you need to use it. Keep in mind this is not a >common >> case in this code here, this is triggered only if earlier allocation >attempts >> failed. Only *then* we try with GFP_MEMALLOC with promises to free >additional >> memory soon. > >You are right that this is the usecase for the flag. But this should be >done with an extreme care because the core MM relies on those reserves >so any other users should better make sure they do not consume a lot >from reserves as well. > Understood and agreed. >> > If a partial access to reserves is sufficient then why the existing >> > modifiers (mentioned above are not sufficient? >> >> The point with using GFP_MEMALLOC is it is useful for situations >where you >> are about to free memory and needed some memory temporarily, to free >that. It >> depletes it a bit temporarily to free even more. Is that not the >point of >> PF_MEMALLOC? >> * %__GFP_MEMALLOC allows access to all memory. This should only be >used when >> * the caller guarantees the allocation will allow more memory to be >freed >> * very shortly e.g. process exiting or swapping. Users either should >> * be the MM or co-ordinating closely with the VM (e.g. swap over >NFS). >> >> I was just recommending usage of this flag here because it fits the >> requirement of allocating some memory to free some memory. I am also >Ok with >> GFP_ATOMIC with the GFP_NOWARN removed, if you are Ok with that. > >Maybe we need to refine this documentation to be more explicit about an >extreme care to be taken when using the flag. > >diff --git a/include/linux/gfp.h b/include/linux/gfp.h >index e5b817cb86e7..e436a7e28392 100644 >--- a/include/linux/gfp.h >+++ b/include/linux/gfp.h >@@ -110,6 +110,9 @@ struct vm_area_struct; >* the caller guarantees the allocation will allow more memory to be >freed > * very shortly e.g. process exiting or swapping. Users either should > * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). >+ * Users of this flag have to be extremely careful to not deplete the >reserve >+ * completely and implement a throttling mechanism which controls the >consumption >+ * based on the amount of freed memory. > * >* %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency >reserves. > * This takes precedence over the %__GFP_MEMALLOC flag if both are set. I am in support of this documentation patch. I would say "consumption of the reserve". Thanks, - Joel -- Sent from my Android device with K-9 Mail. Please excuse my brevity.