On Thu, 6 Apr 2017, Hugh Dickins wrote: > On Thu, 6 Apr 2017, Mel Gorman wrote: > > On Wed, Apr 05, 2017 at 01:59:49PM -0700, Hugh Dickins wrote: > > > Hi Mel, > > > > > > I suspect that it's not safe for kthreadd to drain_all_pages(); > > > but I haven't studied flush_work() etc, so don't really know what > > > I'm talking about: hoping that you will jump to a realization. > > > > > > > You're right, it's not safe. If kthreadd is creating the workqueue > > thread to do the drain and it'll recurse into itself. > > > > > 4.11-rc has been giving me hangs after hours of swapping load. At > > > first they looked like memory leaks ("fork: Cannot allocate memory"); > > > but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh" > > > before looking at /proc/meminfo one time, and the stat_refresh stuck > > > in D state, waiting for completion of flush_work like many kworkers. > > > kthreadd waiting for completion of flush_work in drain_all_pages(). > > > > > > > It's asking itself to do work in all likelihood. > > > > > Patch below has been running well for 36 hours now: > > > a bit too early to be sure, but I think it's time to turn to you. > > > > > > > I think the patch is valid but like Michal, would appreciate if you > > could run the patch he linked to see if it also side-steps the same > > problem. > > > > Good spot! > > Thank you both for explanations, and direction to the two "drainging" > patches. I've put those on to 4.11-rc5 (and double-checked that I've > taken mine off), and set it going. Fine so far but much too soon to > tell - mine did 56 hours with clean /var/log/messages before I switched, > so I demand no less of Michal's :). I'll report back tomorrow and the > day after (unless badness appears sooner once I'm home). 24 hours so far, and with a clean /var/log/messages. Not conclusive yet, and of course I'll leave it running another couple of days, but I'm increasingly sure that it works as you intended: I agree that mm-move-pcp-and-lru-pcp-drainging-into-single-wq.patch mm-move-pcp-and-lru-pcp-drainging-into-single-wq-fix.patch should go to Linus as soon as convenient. Though I think the commit message needs something a bit stronger than "Quite annoying though". Maybe add a line: Fixes serious hang under load, observed repeatedly on 4.11-rc. Thanks! Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>