Re: next: Commit 'mm: Prevent __alloc_pages_nodemask() RCU CPU stall ...' causing hang on sparc32 qemu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 30, 2016 at 03:18:46PM -0800, Guenter Roeck wrote:
> On Wed, Nov 30, 2016 at 01:01:52PM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 30, 2016 at 11:21:59AM -0800, Guenter Roeck wrote:
> > > On Wed, Nov 30, 2016 at 04:03:33AM -0800, Paul E. McKenney wrote:
> > > > On Wed, Nov 30, 2016 at 02:52:11AM -0800, Guenter Roeck wrote:
> > > > > On 11/29/2016 11:02 PM, Paul E. McKenney wrote:
> > > > > >On Tue, Nov 29, 2016 at 08:32:51PM -0800, Guenter Roeck wrote:
> > > > > >>On 11/29/2016 05:28 PM, Paul E. McKenney wrote:
> > > > > >>>On Tue, Nov 29, 2016 at 01:23:08PM -0800, Guenter Roeck wrote:
> > > > > >>>>Hi Paul,
> > > > > >>>>
> > > > > >>>>most of my qemu tests for sparc32 targets started to fail in next-20161129.
> > > > > >>>>The problem is only seen in SMP builds; non-SMP builds are fine.
> > > > > >>>>Bisect points to commit 2d66cccd73436 ("mm: Prevent __alloc_pages_nodemask()
> > > > > >>>>RCU CPU stall warnings"); reverting that commit fixes the problem.
> > > > 
> > > > And I have dropped this patch.  Michal Hocko showed me the error of
> > > > my ways with this patch.
> > > > 
> > > 
> > > :-)
> > > 
> > > On another note, I still get RCU tracebacks in the s390 tests.
> > > 
> > > BUG: sleeping function called from invalid context at mm/page_alloc.c:3775
> > > 
> > > That is caused by 'rcu: Maintain special bits at bottom of ->dynticks counter';
> > > if I recall correctly we had discussed that earlier.
> > 
> > Indeed, I had missed a dyntick counter update back on Nov 11, which meant
> > that some of the code was still looking at the low-order bit instead of
> > the next bit up.  This is now fixed.
> > 
> > So to get to the error message you call out above, I need to have improperly
> > left the system in bh state or left irqs disabled, while the system was
> > running normally without an oops.  I am having a hard time seeing how this
> > patch can do that.
> > 
> > I would be more suspicious of f2a471ffc8a8 ("rcu: Allow boot-time use
> > of cond_resched_rcu_qs()").
> > 
> > So you bisected or did a revert to work out which was the offending commit?
> > 
> 
> My most recent bisect was with the November 10 image, so that would have missed
> any later fix. Comparing the log messages, the current message is indeed
> different. Sorry, I mixed that up; I just assumed that the problem would be
> the same without really checking. My bad.
> 
> Bisect would be tricky, since the s390 image was broken for some time after
> November 10. The first time I have seen the above BUG: was with next-20161128
> (which is the first build after the crash was fixed). That version did not
> include f2a471ffc8a8, so that can not be the cause.
> 
> I'll try to set up a bisect tonight, working around the crash problem.
> I'll let you know how it goes.

Whew!  You had me going for a bit there.  ;-)

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]