On Wed, Nov 30, 2016 at 03:18:46PM -0800, Guenter Roeck wrote: > On Wed, Nov 30, 2016 at 01:01:52PM -0800, Paul E. McKenney wrote: > > On Wed, Nov 30, 2016 at 11:21:59AM -0800, Guenter Roeck wrote: > > > On Wed, Nov 30, 2016 at 04:03:33AM -0800, Paul E. McKenney wrote: > > > > On Wed, Nov 30, 2016 at 02:52:11AM -0800, Guenter Roeck wrote: > > > > > On 11/29/2016 11:02 PM, Paul E. McKenney wrote: > > > > > >On Tue, Nov 29, 2016 at 08:32:51PM -0800, Guenter Roeck wrote: > > > > > >>On 11/29/2016 05:28 PM, Paul E. McKenney wrote: > > > > > >>>On Tue, Nov 29, 2016 at 01:23:08PM -0800, Guenter Roeck wrote: > > > > > >>>>Hi Paul, > > > > > >>>> > > > > > >>>>most of my qemu tests for sparc32 targets started to fail in next-20161129. > > > > > >>>>The problem is only seen in SMP builds; non-SMP builds are fine. > > > > > >>>>Bisect points to commit 2d66cccd73436 ("mm: Prevent __alloc_pages_nodemask() > > > > > >>>>RCU CPU stall warnings"); reverting that commit fixes the problem. > > > > > > > > And I have dropped this patch. Michal Hocko showed me the error of > > > > my ways with this patch. > > > > > > > > > > :-) > > > > > > On another note, I still get RCU tracebacks in the s390 tests. > > > > > > BUG: sleeping function called from invalid context at mm/page_alloc.c:3775 > > > > > > That is caused by 'rcu: Maintain special bits at bottom of ->dynticks counter'; > > > if I recall correctly we had discussed that earlier. > > > > Indeed, I had missed a dyntick counter update back on Nov 11, which meant > > that some of the code was still looking at the low-order bit instead of > > the next bit up. This is now fixed. > > > > So to get to the error message you call out above, I need to have improperly > > left the system in bh state or left irqs disabled, while the system was > > running normally without an oops. I am having a hard time seeing how this > > patch can do that. > > > > I would be more suspicious of f2a471ffc8a8 ("rcu: Allow boot-time use > > of cond_resched_rcu_qs()"). > > > > So you bisected or did a revert to work out which was the offending commit? > > > > My most recent bisect was with the November 10 image, so that would have missed > any later fix. Comparing the log messages, the current message is indeed > different. Sorry, I mixed that up; I just assumed that the problem would be > the same without really checking. My bad. > > Bisect would be tricky, since the s390 image was broken for some time after > November 10. The first time I have seen the above BUG: was with next-20161128 > (which is the first build after the crash was fixed). That version did not > include f2a471ffc8a8, so that can not be the cause. > > I'll try to set up a bisect tonight, working around the crash problem. > I'll let you know how it goes. Whew! You had me going for a bit there. ;-) Thanx, Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>