On Thu, Aug 02, 2018 at 07:08:41PM +0000, David Chen wrote: > Hi all, > > We'd like to have the following commit backport to 4.9 branch to fix an > issue we are seeing. > > 35a2897c2a306cca344ca5c0b43416707018f434 > sched/wait: Remove the lockless swait_active() check in swake_up*() > > In 4.9 branch, we hit an issue in RCU, where the NOCB follower list not getting > reclaimed and causing OOM. > > In discussion with Paul, we were able to figure out the problem was because of > missed wake up resulted from lack of proper memory barrier between setting > wake up condition and swake_up(). > > nocb_leader_wait() > { > *tail = rdp->nocb_gp_head; > smp_mb__after_atomic(); /* Store *tail before wakeup. */ > if (rdp != my_rdp && tail == &rdp->nocb_follower_head) { > swake_up(&rdp->nocb_wq); > > Note, that the smp_mb__after_atomic() is only a compiler barrier on x86. > Originally I was going to change the barrier to smp_mb(). But then I found out > master has the above mentioned patch that solves the same class of problem by > removing the lockless check inside swake_up(). > > So I'm wonder if we can backport this patch to 4.9 branch to solve this issue, > and maybe solve other potential missed wake up issue as well. Now applied, thanks. greg k-h