Re: [Bug 12945] New: SCSI Generic (sg): BUG: sleeping function called from invalid context

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 26 2009, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 26 Mar 2009 12:27:53 GMT bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=12945
> > 
> >            Summary: SCSI Generic (sg): BUG: sleeping function called from
> >                     invalid context
> >            Product: SCSI Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.28.9
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx
> >         ReportedBy: txtoxtox285@xxxxxxxxxxxxxx
> >         Regression: No
> > 
> > 
> > Created an attachment (id=20685)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=20685)
> > Stack trace on program kill (2.6.28.9)
> > 
> > I am experimenting with CD audio extraction. I use the SCSI Generic driver for
> > this.
> > 
> > My test program uses read() and write() (instead of ioctl) to send requests to
> > the driver and receive responses. I use SG_FLAG_DIRECT_IO.
> > 
> > When I kill my program (because I don't want to wait until it has ripped the
> > entire CD), I am often rewarded with messages like "BUG: sleeping function
> > called from invalid context at linux-2.6.28.9/include/linux/pagemap.h:347". I
> > have attached typical stack trace.
> > 
> > Another case when I hit this BUG is when I set a time out and the CD drive
> > doesn't respond fast enough. A stack trace is attached.
> 
> > [34215.786870] BUG: sleeping function called from invalid context at /mnt/var-pub/src/linux-2.6.28.9/include/linux/pagemap.h:347
> > [34215.786880] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper
> > [34215.786886] Pid: 0, comm: swapper Not tainted 2.6.28.9 #1
> > [34215.786890] Call Trace:
> > [34215.786894]  <IRQ>  [<ffffffff8026c4cc>] set_page_dirty_lock+0x1a/0x45
> > [34215.786911]  [<ffffffff802ae17d>] bio_unmap_user+0x1e/0x4a
> > [34215.786920]  [<ffffffff802e876b>] __blk_rq_unmap_user+0x14/0x20
> > [34215.786928]  [<ffffffff80210852>] pit_next_event+0x2e/0x49
> > [34215.786934]  [<ffffffff802e8795>] blk_rq_unmap_user+0x1e/0x4b
> > [34215.786965]  [<ffffffffa0163475>] sg_finish_rem_req+0x6d/0x88 [sg]
> > [34215.786979]  [<ffffffffa0164ef3>] sg_rq_end_io+0x131/0x205 [sg]
> > [34215.786986]  [<ffffffff802e5c1f>] end_that_request_last+0x58/0x194
> > [34215.786992]  [<ffffffff802e5e00>] blk_end_io+0x48/0x7d
> > [34215.787019]  [<ffffffffa0026bef>] scsi_next_command+0x219/0x283 [scsi_mod]
> > [34215.787039]  [<ffffffffa00279b1>] scsi_io_completion+0x181/0x53b [scsi_mod]
> > [34215.787047]  [<ffffffff802e9737>] blk_done_softirq+0x5f/0x6d
> > [34215.787054]  [<ffffffff80230787>] __do_softirq+0x5e/0xf8
> > [34215.787061]  [<ffffffff8020ca8c>] call_softirq+0x1c/0x28
> > [34215.787067]  [<ffffffff8020d6bc>] do_softirq+0x2c/0x68
> > [34215.787073]  [<ffffffff80230696>] irq_exit+0x36/0x82
> > [34215.787079]  [<ffffffff8020d79e>] do_IRQ+0xa6/0xb8
> > [34215.787085]  [<ffffffff8020c256>] ret_from_intr+0x0/0xa
> > [34215.787088]  <EOI>  [<ffffffff8034f648>] menu_reflect+0x0/0x6d
> > [34215.787112]  [<ffffffffa0147d51>] acpi_idle_enter_simple+0x170/0x1d6 [processor]
> > [34215.787127]  [<ffffffffa0147d47>] acpi_idle_enter_simple+0x166/0x1d6 [processor]
> > [34215.787134]  [<ffffffff8034eb32>] cpuidle_idle_call+0x73/0xb1
> > [34215.787140]  [<ffffffff8020ac2a>] cpu_idle+0x3c/0x73
> 
> Argh.  sg_finish_rem_req() is called from interrupt context.  But
> blk_rq_unmap_user() can run
> __bio_unmap_user()->set_page_dirty_lock()->lock_page(), which can call
> schedule().  If it does call schedule(), the machine will crash.
> 
> afacit, blk_rq_unmap_user() has always been a can-sleep function, and
> this is a regression caused by
> 
> commit 6e5a30cba5e7c03b2cd564e968f1dd667a0f7c42

Yep, it is. The problem is the usage of:

        blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
                              srp->rq, 1, sg_rq_end_io);

and then doing the sg_finish_rem_req() -> blk_rq_unmap_user() from the
end_io path, where other users do a sync request and then unmap from the
same context. Hmm. Perhaps we can add some request flag to specify doing
the completion from user context, then other users could be converted do
the _nowait() approach as well and get some unification/cleanup there as
well.

I'll cook up a patch.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux