Re: Kernel-3.1 Crash

Antonio Trande <anto.trande@xxxxxxxxx> · Fri, 28 Oct 2011 16:30:53 +0200

I don't know if useful but during boot with kernel 3.0 appears:

 $ dmesg | grep multipath
> [    4.113786] device-mapper: multipath: version 1.3.0 loaded
> [    4.164462] device-mapper: multipath round-robin: version 1.0.0 loaded
> [   35.443230] multipathd[1184]: /lib/udev/scsi_id exitted with 1
> [   35.443682] multipathd[1184]: /lib/udev/scsi_id exitted with 1

Must i consider this problem as a kernel 3.1 bug ?
I don't know where come from this multipath configuration, i have always
done simple Fedora installations.

Thanks.

2011/10/27 Antonio Trande <anto.trande@xxxxxxxxx>

> >do you have multipath configured on your box?
> If i have understand the 'multipath concept', yes. fdisk output<http://www.fpaste.org/KXvm/>
>
>
> >How often can you reproduce this problem.
> Only with Kernel 3.1.
> If fsck is enabled on / partition (btrfs filesystem) also with Kernel 3.0
>
> 2011/10/27 Vivek Goyal <vgoyal@xxxxxxxxxx>
>
>> On Thu, Oct 27, 2011 at 09:31:13PM +0200, Antonio Trande wrote:
>> > Should i be the "victim" ? :)
>> > If need tests, i'm available.
>>
>> do you have multipath configured on your box? How often can you reproduce
>> this problem.  Can you reproduce the problem with single cpu in the
>> system.
>>
>> Thanks
>> Vivek
>>
>> >
>> > 2011/10/27 Vivek Goyal <vgoyal@xxxxxxxxxx>
>> >
>> > > On Thu, Oct 27, 2011 at 03:20:51PM -0400, Jeff Moyer wrote:
>> > > > Don Zickus <dzickus@xxxxxxxxxx> writes:
>> > > >
>> > > > > On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote:
>> > > > >> >> This doesn't look like the same problem.  Here we've got BUG:
>> > > scheduling
>> > > > >> >> while atomic.  If it was the bug fixed by the above commits,
>> then
>> > > you
>> > > > >> >> would hit a BUG_ON.  I would start looking at the btrfs bits
>> to see
>> > > if
>> > > > >> >> they're holding any locks in this code path.
>> > > > >> >
>> > > > >> > Ignore that one and move to IMG_0350.IMG.  'scheduling while
>> atomic'
>> > > is
>> > > > >> > just noise.  Besides Mike and Vivek told me to blame you for
>> not
>> > > pushing
>> > > > >> > Jens harder on these fixes. :-)))))
>> > > > >>
>> > > > >> I'm looking at 0355, which shows the very top of the trace, and
>> that
>> > > > >> says BUG: scheduling while atomic.  So the problem reported here
>> *is*
>> > > > >> different from the one fixed by the above two commits.  In fact,
>> I
>> > > don't
>> > > > >> see evidence of the multipath + flush issue in any of these
>> pictures.
>> > > > >
>> > > > > You have to ignore the 'schedule while atomic' thing it is just a
>> > > > >
>> > > > > printk("BUG: scheduling while atomic"), it is _not_ a BUG().  :-)
>> > > > > (hint read kernel/sched.c::__schedule_bug)
>> > > > >
>> > > > > I see those messages all the time, it really should be a WARN and
>> not a
>> > > > > misleading BUG, but whatever.
>> > > > >
>> > > > > His machine died because the NMI watchdog detected a lockup.  The
>> > > lockup
>> > > > > was because in blk_insert_cloned_request(), spin_lock_irqsave
>> disabled
>> > > > > interrupts and spun forever waiting on the q->queue_lock
>> > > (IMG_0350.JPG).
>> > > > >
>> > > > > Mike and Vivek both said that is what you fixed for 3.2.  They
>> also
>> > > said
>> > > > > the only caller of blk_insert_cloned_request() is multipath, hence
>> that
>> > > > > argument.  I'll cc them.  Or maybe I can have them walk over to
>> your
>> > > cube.
>> > > > > :-)
>> > > >
>> > > > Well then they know more than I do.  The bug I fixed would not
>> result in
>> > > > infinite spinning on the queue lock.  It resulted in a BUG_ON in
>> > > > blk_insert_flush, since req->bio was NULL.  So again, I really don't
>> see
>> > > > how this is related.  We could put this all to rest by asking the
>> victim
>> > > > to try out those two patches.
>> > >
>> > > Sorry for the confusion here. We saw the blk_insert_cloned_request()
>> in
>> > > the trace and thought it could be related to your fixes. Did not think
>> > > about exact symtom of the problem in your case. So you are right. Here
>> > > we are spinning on spinlock infinitely and your patch fixed the
>> BUG_ON().
>> > > So may be it is a different issue.
>> > >
>> > > Thanks
>> > > Vivek
>> > >
>> >
>> >
>> >
>> > --
>> > *Antonio Trande
>> > "Fedora Ambassador"
>> >
>> > **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx>
>> > *Homepage*: http://www.fedora-os.org
>> > *Sip Address* : sip:sagitter AT ekiga.net
>> > *Jabber <http://jabber.org/>* :sagitter AT jabber.org
>> > *GPG Key: CFE3479C*
>>
>
>
>
> --
> *Antonio Trande
> "Fedora Ambassador"
>
> **mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx>
> *Homepage*: http://www.fedora-os.org
> *Sip Address* : sip:sagitter AT ekiga.net
> *Jabber <http://jabber.org/>* :sagitter AT jabber.org
> *GPG Key: CFE3479C*
>
>

-- 
*Antonio Trande
"Fedora Ambassador"

**mail*: mailto:sagitter@xxxxxxxxxxxxxxxxx <sagitter@xxxxxxxxxxxxxxxxx>
*Homepage*: http://www.fedora-os.org
*Sip Address* : sip:sagitter AT ekiga.net
*Jabber <http://jabber.org/>* :sagitter AT jabber.org
*GPG Key: CFE3479C*
_______________________________________________
kernel mailing list
kernel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/kernel