---------- Forwarded message ---------- From: Antonio Trande <anto.trande@xxxxxxxxx> Date: 2011/10/27 Subject: Re: Kernel-3.1 Crash To: Vivek Goyal <vgoyal@xxxxxxxxxx> Should i be the "victim" ? :) If need tests, i'm available. 2011/10/27 Vivek Goyal <vgoyal@xxxxxxxxxx> > On Thu, Oct 27, 2011 at 03:20:51PM -0400, Jeff Moyer wrote: > > Don Zickus <dzickus@xxxxxxxxxx> writes: > > > > > On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote: > > >> >> This doesn't look like the same problem. Here we've got BUG: > scheduling > > >> >> while atomic. If it was the bug fixed by the above commits, then > you > > >> >> would hit a BUG_ON. I would start looking at the btrfs bits to see > if > > >> >> they're holding any locks in this code path. > > >> > > > >> > Ignore that one and move to IMG_0350.IMG. 'scheduling while atomic' > is > > >> > just noise. Besides Mike and Vivek told me to blame you for not > pushing > > >> > Jens harder on these fixes. :-))))) > > >> > > >> I'm looking at 0355, which shows the very top of the trace, and that > > >> says BUG: scheduling while atomic. So the problem reported here *is* > > >> different from the one fixed by the above two commits. In fact, I > don't > > >> see evidence of the multipath + flush issue in any of these pictures. > > > > > > You have to ignore the 'schedule while atomic' thing it is just a > > > > > > printk("BUG: scheduling while atomic"), it is _not_ a BUG(). :-) > > > (hint read kernel/sched.c::__schedule_bug) > > > > > > I see those messages all the time, it really should be a WARN and not a > > > misleading BUG, but whatever. > > > > > > His machine died because the NMI watchdog detected a lockup. The > lockup > > > was because in blk_insert_cloned_request(), spin_lock_irqsave disabled > > > interrupts and spun forever waiting on the q->queue_lock > (IMG_0350.JPG). > > > > > > Mike and Vivek both said that is what you fixed for 3.2. They also > said > > > the only caller of blk_insert_cloned_request() is multipath, hence that > > > argument. I'll cc them. Or maybe I can have them walk over to your > cube. > > > :-) > > > > Well then they know more than I do. The bug I fixed would not result in > > infinite spinning on the queue lock. It resulted in a BUG_ON in > > blk_insert_flush, since req->bio was NULL. So again, I really don't see > > how this is related. We could put this all to rest by asking the victim > > to try out those two patches. > > Sorry for the confusion here. We saw the blk_insert_cloned_request() in > the trace and thought it could be related to your fixes. Did not think > about exact symtom of the problem in your case. So you are right. Here > we are spinning on spinlock infinitely and your patch fixed the BUG_ON(). > So may be it is a different issue. > > Thanks > Vivek > -- *Antonio Trande "Fedora Ambassador" **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx> *Homepage*: http://www.fedora-os.org *Sip Address* : sip:sagitter AT ekiga.net *Jabber <http://jabber.org/>* :sagitter AT jabber.org *GPG Key: CFE3479C* -- *Antonio Trande "Fedora Ambassador" **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx> *Homepage*: http://www.fedora-os.org *Sip Address* : sip:sagitter AT ekiga.net *Jabber <http://jabber.org/>* :sagitter AT jabber.org *GPG Key: CFE3479C* _______________________________________________ kernel mailing list kernel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/kernel