Fwd: Kernel-3.1 Crash

Antonio Trande <anto.trande@xxxxxxxxx> · Thu, 27 Oct 2011 21:31:53 +0200

---------- Forwarded message ----------
From: Antonio Trande <anto.trande@xxxxxxxxx>
Date: 2011/10/27
Subject: Re: Kernel-3.1 Crash
To: Vivek Goyal <vgoyal@xxxxxxxxxx>

Should i be the "victim" ? :)
If need tests, i'm available.

2011/10/27 Vivek Goyal <vgoyal@xxxxxxxxxx>

> On Thu, Oct 27, 2011 at 03:20:51PM -0400, Jeff Moyer wrote:
> > Don Zickus <dzickus@xxxxxxxxxx> writes:
> >
> > > On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote:
> > >> >> This doesn't look like the same problem.  Here we've got BUG:
> scheduling
> > >> >> while atomic.  If it was the bug fixed by the above commits, then
> you
> > >> >> would hit a BUG_ON.  I would start looking at the btrfs bits to see
> if
> > >> >> they're holding any locks in this code path.
> > >> >
> > >> > Ignore that one and move to IMG_0350.IMG.  'scheduling while atomic'
> is
> > >> > just noise.  Besides Mike and Vivek told me to blame you for not
> pushing
> > >> > Jens harder on these fixes. :-)))))
> > >>
> > >> I'm looking at 0355, which shows the very top of the trace, and that
> > >> says BUG: scheduling while atomic.  So the problem reported here *is*
> > >> different from the one fixed by the above two commits.  In fact, I
> don't
> > >> see evidence of the multipath + flush issue in any of these pictures.
> > >
> > > You have to ignore the 'schedule while atomic' thing it is just a
> > >
> > > printk("BUG: scheduling while atomic"), it is _not_ a BUG().  :-)
> > > (hint read kernel/sched.c::__schedule_bug)
> > >
> > > I see those messages all the time, it really should be a WARN and not a
> > > misleading BUG, but whatever.
> > >
> > > His machine died because the NMI watchdog detected a lockup.  The
> lockup
> > > was because in blk_insert_cloned_request(), spin_lock_irqsave disabled
> > > interrupts and spun forever waiting on the q->queue_lock
> (IMG_0350.JPG).
> > >
> > > Mike and Vivek both said that is what you fixed for 3.2.  They also
> said
> > > the only caller of blk_insert_cloned_request() is multipath, hence that
> > > argument.  I'll cc them.  Or maybe I can have them walk over to your
> cube.
> > > :-)
> >
> > Well then they know more than I do.  The bug I fixed would not result in
> > infinite spinning on the queue lock.  It resulted in a BUG_ON in
> > blk_insert_flush, since req->bio was NULL.  So again, I really don't see
> > how this is related.  We could put this all to rest by asking the victim
> > to try out those two patches.
>
> Sorry for the confusion here. We saw the blk_insert_cloned_request() in
> the trace and thought it could be related to your fixes. Did not think
> about exact symtom of the problem in your case. So you are right. Here
> we are spinning on spinlock infinitely and your patch fixed the BUG_ON().
> So may be it is a different issue.
>
> Thanks
> Vivek
>

-- 
*Antonio Trande
"Fedora Ambassador"

**mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx>
*Homepage*: http://www.fedora-os.org
*Sip Address* : sip:sagitter AT ekiga.net
*Jabber <http://jabber.org/>* :sagitter AT jabber.org
*GPG Key: CFE3479C*

-- 
*Antonio Trande
"Fedora Ambassador"

**mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx>
*Homepage*: http://www.fedora-os.org
*Sip Address* : sip:sagitter AT ekiga.net
*Jabber <http://jabber.org/>* :sagitter AT jabber.org
*GPG Key: CFE3479C*
_______________________________________________
kernel mailing list
kernel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/kernel