Re: Kernel-3.1 Crash

Don Zickus <dzickus@xxxxxxxxxx> · Thu, 27 Oct 2011 15:09:05 -0400

On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote:
> >> This doesn't look like the same problem.  Here we've got BUG: scheduling
> >> while atomic.  If it was the bug fixed by the above commits, then you
> >> would hit a BUG_ON.  I would start looking at the btrfs bits to see if
> >> they're holding any locks in this code path.
> >
> > Ignore that one and move to IMG_0350.IMG.  'scheduling while atomic' is
> > just noise.  Besides Mike and Vivek told me to blame you for not pushing
> > Jens harder on these fixes. :-)))))
> 
> I'm looking at 0355, which shows the very top of the trace, and that
> says BUG: scheduling while atomic.  So the problem reported here *is*
> different from the one fixed by the above two commits.  In fact, I don't
> see evidence of the multipath + flush issue in any of these pictures.

You have to ignore the 'schedule while atomic' thing it is just a

printk("BUG: scheduling while atomic"), it is _not_ a BUG().  :-)
(hint read kernel/sched.c::__schedule_bug)

I see those messages all the time, it really should be a WARN and not a
misleading BUG, but whatever. 

His machine died because the NMI watchdog detected a lockup.  The lockup
was because in blk_insert_cloned_request(), spin_lock_irqsave disabled
interrupts and spun forever waiting on the q->queue_lock (IMG_0350.JPG).

Mike and Vivek both said that is what you fixed for 3.2.  They also said
the only caller of blk_insert_cloned_request() is multipath, hence that
argument.  I'll cc them.  Or maybe I can have them walk over to your cube.
:-)

Cheers,
Don
_______________________________________________
kernel mailing list
kernel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/kernel