Re: PROBLEM alert - Host fas03 is DOWN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 11 Sep 2010, Jon Masters wrote:

> On Sat, 2010-09-11 at 02:51 -0400, Jon Masters wrote:
> > On Fri, 2010-09-10 at 19:24 -0600, Stephen John Smoogen wrote:
> >
> > > Sep 11 01:10:23 fas03 kernel: WARNING: at block/blk-core.c:338
> >
> > > Sep 11 01:10:23 fas03 kernel: [<c044fc97>] ? warn_slowpath_common+0x77/0xb0
> > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > Sep 11 01:10:23 fas03 kernel: [<c044fce3>] ? warn_slowpath_null+0x13/0x20
> > > Sep 11 01:10:23 fas03 kernel: [<c05ca5dc>] ? blk_start_queue+0x6c/0x70
> > > Sep 11 01:10:23 fas03 kernel: [<ed63896b>] ?
> > > kick_pending_request_queues+0x1b/0x30 [xen_blkfront]
> > > Sep 11 01:10:23 fas03 kernel: [<ed638b80>] ?
> > > blkif_interrupt+0x200/0x220 [xen_blkfront]
> > > Sep 11 01:10:23 fas03 kernel: [<c04ad7c5>] ? handle_IRQ_event+0x45/0x140
> >
> > The code in block/blk-core:338 contains an explicit check to ensure that
> > interrupts have been disabled, but this not true since blkif_interrupt
> > is not registered with IRQF_DISABLED set at the time of the setup in
> > bind_evtchn_to_irqhandler. Thus it might be that interrupts are still on
> > when we get to kick_pending_request_queues. Does this always happen?
> >
> > This perhaps happened because upstream removed IRQF_DISABLED and now
> > runs with interrupts disabled in handle_IRQ_event, so Xen won't see
> > this. But on 2.6.32 this change had not yet happened. It's also 2:50am
> > and I might be reading this wrong, but I at least suggest you open a
> > RHEL6 bug and try a more recent kernel build.
>
> Ah, of course I shouldn't email before bed. There's an obvious giant
> spin_lock_irqsave/restore there, but as noted on xen-devel (when they
> were mulling over moving all of the blkif_interrupt bits into a tasklet
> jut a couple of weeks ago): "It looks like __blk_end_request_all...is
> returning with interrupts enabled sometimes". I pinged some folks.
>

Thanks for looking into this Jon, we happened to have 3 hosts die of this
within about 2 hours last night.  Here's the bug report Smooge opened:

https://bugzilla.redhat.com/show_bug.cgi?id=632802

I'll take a look around for a more recent RHEL6 kernel

	-Mike
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure


[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux