I don't know if useful but during boot with kernel 3.0 appears: $ dmesg | grep multipath > [ 4.113786] device-mapper: multipath: version 1.3.0 loaded > [ 4.164462] device-mapper: multipath round-robin: version 1.0.0 loaded > [ 35.443230] multipathd[1184]: /lib/udev/scsi_id exitted with 1 > [ 35.443682] multipathd[1184]: /lib/udev/scsi_id exitted with 1 Must i consider this problem as a kernel 3.1 bug ? I don't know where come from this multipath configuration, i have always done simple Fedora installations. Thanks. 2011/10/27 Antonio Trande <anto.trande@xxxxxxxxx> > >do you have multipath configured on your box? > If i have understand the 'multipath concept', yes. fdisk output<http://www.fpaste.org/KXvm/> > > > >How often can you reproduce this problem. > Only with Kernel 3.1. > If fsck is enabled on / partition (btrfs filesystem) also with Kernel 3.0 > > 2011/10/27 Vivek Goyal <vgoyal@xxxxxxxxxx> > >> On Thu, Oct 27, 2011 at 09:31:13PM +0200, Antonio Trande wrote: >> > Should i be the "victim" ? :) >> > If need tests, i'm available. >> >> do you have multipath configured on your box? How often can you reproduce >> this problem. Can you reproduce the problem with single cpu in the >> system. >> >> Thanks >> Vivek >> >> > >> > 2011/10/27 Vivek Goyal <vgoyal@xxxxxxxxxx> >> > >> > > On Thu, Oct 27, 2011 at 03:20:51PM -0400, Jeff Moyer wrote: >> > > > Don Zickus <dzickus@xxxxxxxxxx> writes: >> > > > >> > > > > On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote: >> > > > >> >> This doesn't look like the same problem. Here we've got BUG: >> > > scheduling >> > > > >> >> while atomic. If it was the bug fixed by the above commits, >> then >> > > you >> > > > >> >> would hit a BUG_ON. I would start looking at the btrfs bits >> to see >> > > if >> > > > >> >> they're holding any locks in this code path. >> > > > >> > >> > > > >> > Ignore that one and move to IMG_0350.IMG. 'scheduling while >> atomic' >> > > is >> > > > >> > just noise. Besides Mike and Vivek told me to blame you for >> not >> > > pushing >> > > > >> > Jens harder on these fixes. :-))))) >> > > > >> >> > > > >> I'm looking at 0355, which shows the very top of the trace, and >> that >> > > > >> says BUG: scheduling while atomic. So the problem reported here >> *is* >> > > > >> different from the one fixed by the above two commits. In fact, >> I >> > > don't >> > > > >> see evidence of the multipath + flush issue in any of these >> pictures. >> > > > > >> > > > > You have to ignore the 'schedule while atomic' thing it is just a >> > > > > >> > > > > printk("BUG: scheduling while atomic"), it is _not_ a BUG(). :-) >> > > > > (hint read kernel/sched.c::__schedule_bug) >> > > > > >> > > > > I see those messages all the time, it really should be a WARN and >> not a >> > > > > misleading BUG, but whatever. >> > > > > >> > > > > His machine died because the NMI watchdog detected a lockup. The >> > > lockup >> > > > > was because in blk_insert_cloned_request(), spin_lock_irqsave >> disabled >> > > > > interrupts and spun forever waiting on the q->queue_lock >> > > (IMG_0350.JPG). >> > > > > >> > > > > Mike and Vivek both said that is what you fixed for 3.2. They >> also >> > > said >> > > > > the only caller of blk_insert_cloned_request() is multipath, hence >> that >> > > > > argument. I'll cc them. Or maybe I can have them walk over to >> your >> > > cube. >> > > > > :-) >> > > > >> > > > Well then they know more than I do. The bug I fixed would not >> result in >> > > > infinite spinning on the queue lock. It resulted in a BUG_ON in >> > > > blk_insert_flush, since req->bio was NULL. So again, I really don't >> see >> > > > how this is related. We could put this all to rest by asking the >> victim >> > > > to try out those two patches. >> > > >> > > Sorry for the confusion here. We saw the blk_insert_cloned_request() >> in >> > > the trace and thought it could be related to your fixes. Did not think >> > > about exact symtom of the problem in your case. So you are right. Here >> > > we are spinning on spinlock infinitely and your patch fixed the >> BUG_ON(). >> > > So may be it is a different issue. >> > > >> > > Thanks >> > > Vivek >> > > >> > >> > >> > >> > -- >> > *Antonio Trande >> > "Fedora Ambassador" >> > >> > **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx> >> > *Homepage*: http://www.fedora-os.org >> > *Sip Address* : sip:sagitter AT ekiga.net >> > *Jabber <http://jabber.org/>* :sagitter AT jabber.org >> > *GPG Key: CFE3479C* >> > > > > -- > *Antonio Trande > "Fedora Ambassador" > > **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx> > *Homepage*: http://www.fedora-os.org > *Sip Address* : sip:sagitter AT ekiga.net > *Jabber <http://jabber.org/>* :sagitter AT jabber.org > *GPG Key: CFE3479C* > > -- *Antonio Trande "Fedora Ambassador" **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx> *Homepage*: http://www.fedora-os.org *Sip Address* : sip:sagitter AT ekiga.net *Jabber <http://jabber.org/>* :sagitter AT jabber.org *GPG Key: CFE3479C* _______________________________________________ kernel mailing list kernel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/kernel