On Thu, Oct 27, 2011 at 02:43:22PM -0400, Jeff Moyer wrote: > >> This doesn't look like the same problem. Here we've got BUG: scheduling > >> while atomic. If it was the bug fixed by the above commits, then you > >> would hit a BUG_ON. I would start looking at the btrfs bits to see if > >> they're holding any locks in this code path. > > > > Ignore that one and move to IMG_0350.IMG. 'scheduling while atomic' is > > just noise. Besides Mike and Vivek told me to blame you for not pushing > > Jens harder on these fixes. :-))))) > > I'm looking at 0355, which shows the very top of the trace, and that > says BUG: scheduling while atomic. So the problem reported here *is* > different from the one fixed by the above two commits. In fact, I don't > see evidence of the multipath + flush issue in any of these pictures. You have to ignore the 'schedule while atomic' thing it is just a printk("BUG: scheduling while atomic"), it is _not_ a BUG(). :-) (hint read kernel/sched.c::__schedule_bug) I see those messages all the time, it really should be a WARN and not a misleading BUG, but whatever. His machine died because the NMI watchdog detected a lockup. The lockup was because in blk_insert_cloned_request(), spin_lock_irqsave disabled interrupts and spun forever waiting on the q->queue_lock (IMG_0350.JPG). Mike and Vivek both said that is what you fixed for 3.2. They also said the only caller of blk_insert_cloned_request() is multipath, hence that argument. I'll cc them. Or maybe I can have them walk over to your cube. :-) Cheers, Don _______________________________________________ kernel mailing list kernel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/kernel