On Fri 29-06-18 05:52:18, Paul E. McKenney wrote: > On Fri, Jun 29, 2018 at 11:04:19AM +0200, Michal Hocko wrote: > > On Thu 28-06-18 14:31:05, Paul E. McKenney wrote: > > > On Thu, Jun 28, 2018 at 01:39:42PM +0200, Michal Hocko wrote: [...] > > > > Well, I am not really sure what is the objective of the oom notifier to > > > > point you to the right direction. IIUC you just want to kick callbacks > > > > to be handled sooner under a heavy memory pressure, right? How is that > > > > achieved? Kick a worker? > > > > > > That is achieved by enqueuing a non-lazy callback on each CPU's callback > > > list, but only for those CPUs having non-empty lists. This causes > > > CPUs with lists containing only lazy callbacks to be more aggressive, > > > in particular, it prevents such CPUs from hanging out idle for seconds > > > at a time while they have callbacks on their lists. > > > > > > The enqueuing happens via an IPI to the CPU in question. > > > > I am afraid this is too low level for my to understand what is going on > > here. What are lazy callbacks and why do they need any specific action > > when we are getting close to OOM? I mean, I do understand that we might > > have many callers of call_rcu and free memory lazily. But there is quite > > a long way before we start the reclaim until we reach the OOM killer path. > > So why don't those callbacks get called during that time period? How are > > their triggered when we are not hitting the OOM path? They surely cannot > > sit there for ever, right? Can we trigger them sooner? Maybe the > > shrinker is not the best fit but we have a retry feedback loop in the page > > allocator, maybe we can kick this processing from there. > > The effect of RCU's current OOM code is to speed up callback invocation > by at most a few seconds (assuming no stalled CPUs, in which case > it is not possible to speed up callback invocation). > > Given that, I should just remove RCU's OOM code entirely? Yeah, it seems so. I do not see how this would really help much. If we really need some way to kick callbacks then we should do so much earlier in the reclaim process - e.g. when we start struggling to reclaim any memory. I am curious. Has the notifier been motivated by a real world use case or it was "nice thing to do"? -- Michal Hocko SUSE Labs