On Thu, Jun 08, 2017 at 01:55:00PM -0700, Paul E. McKenney wrote: > On Thu, Jun 08, 2017 at 01:11:48PM -0700, Krister Johansen wrote: > > May I impose upon you to CC this patch to stable, and tag it as fixing > > abedf8e241? I ran into this on a production 4.9 branch. When I > > debugged it, I discovered that it went all the way back to 4.6. The > > tl;dr is that at least for some environments, the missed wakeup > > manifests itself as a series of hung-task warnings to console and if I'm > > unlucky it can also generate a hang that can block interactive logins > > via ssh. > > Interesting! This is the first that I have heard that this was anything > other than a theoretical bug. To the comment in your second URL, it is > wise to recall that a seismologist was in fact arrested for failing to > predict an earthquake. Later acquitted/pardoned/whatever, but arrested > nonetheless. ;-) Point taken. I do realize that we all make mistakes, and certainly I do too. Perhaps I should have said that my survey of current callers of swake_up() was enough to convince me that I didn't have an immediate problem elsewhere, but that I'm not familiar enough with the code base to make that statement with a lot of authority. The concern being that if the patch came from RT-linux where the barrier was present in swake_up(), are there other places where swake_up() callers still assume this is being handled on their behalf? As part of this, I also pondered whether I should add a comment around swake_up(), similar to what's already there for waitqueue_active. I wasn't sure how subtle this is for other consumers, though. > Silliness aside, does my patch actually fix your problem in practice as > well as in theory? If so, may I have your Tested-by? Yes, it absolutely does. Consider it given: Tested-by: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> > Impressive investigative effort, by the way! Thanks! -K