On Fri, Mar 27 2009, FUJITA Tomonori wrote: > On Thu, 26 Mar 2009 19:43:02 +0100 > Jens Axboe <jens.axboe@xxxxxxxxxx> wrote: > > > On Thu, Mar 26 2009, Andrew Morton wrote: > > > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > bugzilla web interface). > > > > > > On Thu, 26 Mar 2009 12:27:53 GMT bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=12945 > > > > > > > > Summary: SCSI Generic (sg): BUG: sleeping function called from > > > > invalid context > > > > Product: SCSI Drivers > > > > Version: 2.5 > > > > Kernel Version: 2.6.28.9 > > > > Platform: All > > > > OS/Version: Linux > > > > Tree: Mainline > > > > Status: NEW > > > > Severity: normal > > > > Priority: P1 > > > > Component: Other > > > > AssignedTo: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx > > > > ReportedBy: txtoxtox285@xxxxxxxxxxxxxx > > > > Regression: No > > > > > > > > > > > > Created an attachment (id=20685) > > > > --> (http://bugzilla.kernel.org/attachment.cgi?id=20685) > > > > Stack trace on program kill (2.6.28.9) > > > > > > > > I am experimenting with CD audio extraction. I use the SCSI Generic driver for > > > > this. > > > > > > > > My test program uses read() and write() (instead of ioctl) to send requests to > > > > the driver and receive responses. I use SG_FLAG_DIRECT_IO. > > > > > > > > When I kill my program (because I don't want to wait until it has ripped the > > > > entire CD), I am often rewarded with messages like "BUG: sleeping function > > > > called from invalid context at linux-2.6.28.9/include/linux/pagemap.h:347". I > > > > have attached typical stack trace. > > > > > > > > Another case when I hit this BUG is when I set a time out and the CD drive > > > > doesn't respond fast enough. A stack trace is attached. > > > > > > > [34215.786870] BUG: sleeping function called from invalid context at /mnt/var-pub/src/linux-2.6.28.9/include/linux/pagemap.h:347 > > > > [34215.786880] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper > > > > [34215.786886] Pid: 0, comm: swapper Not tainted 2.6.28.9 #1 > > > > [34215.786890] Call Trace: > > > > [34215.786894] <IRQ> [<ffffffff8026c4cc>] set_page_dirty_lock+0x1a/0x45 > > > > [34215.786911] [<ffffffff802ae17d>] bio_unmap_user+0x1e/0x4a > > > > [34215.786920] [<ffffffff802e876b>] __blk_rq_unmap_user+0x14/0x20 > > > > [34215.786928] [<ffffffff80210852>] pit_next_event+0x2e/0x49 > > > > [34215.786934] [<ffffffff802e8795>] blk_rq_unmap_user+0x1e/0x4b > > > > [34215.786965] [<ffffffffa0163475>] sg_finish_rem_req+0x6d/0x88 [sg] > > > > [34215.786979] [<ffffffffa0164ef3>] sg_rq_end_io+0x131/0x205 [sg] > > > > [34215.786986] [<ffffffff802e5c1f>] end_that_request_last+0x58/0x194 > > > > [34215.786992] [<ffffffff802e5e00>] blk_end_io+0x48/0x7d > > > > [34215.787019] [<ffffffffa0026bef>] scsi_next_command+0x219/0x283 [scsi_mod] > > > > [34215.787039] [<ffffffffa00279b1>] scsi_io_completion+0x181/0x53b [scsi_mod] > > > > [34215.787047] [<ffffffff802e9737>] blk_done_softirq+0x5f/0x6d > > > > [34215.787054] [<ffffffff80230787>] __do_softirq+0x5e/0xf8 > > > > [34215.787061] [<ffffffff8020ca8c>] call_softirq+0x1c/0x28 > > > > [34215.787067] [<ffffffff8020d6bc>] do_softirq+0x2c/0x68 > > > > [34215.787073] [<ffffffff80230696>] irq_exit+0x36/0x82 > > > > [34215.787079] [<ffffffff8020d79e>] do_IRQ+0xa6/0xb8 > > > > [34215.787085] [<ffffffff8020c256>] ret_from_intr+0x0/0xa > > > > [34215.787088] <EOI> [<ffffffff8034f648>] menu_reflect+0x0/0x6d > > > > [34215.787112] [<ffffffffa0147d51>] acpi_idle_enter_simple+0x170/0x1d6 [processor] > > > > [34215.787127] [<ffffffffa0147d47>] acpi_idle_enter_simple+0x166/0x1d6 [processor] > > > > [34215.787134] [<ffffffff8034eb32>] cpuidle_idle_call+0x73/0xb1 > > > > [34215.787140] [<ffffffff8020ac2a>] cpu_idle+0x3c/0x73 > > > > > > Argh. sg_finish_rem_req() is called from interrupt context. But > > > blk_rq_unmap_user() can run > > > __bio_unmap_user()->set_page_dirty_lock()->lock_page(), which can call > > > schedule(). If it does call schedule(), the machine will crash. > > > > > > afacit, blk_rq_unmap_user() has always been a can-sleep function, and > > > this is a regression caused by > > > > > > commit 6e5a30cba5e7c03b2cd564e968f1dd667a0f7c42 > > > > Yep, it is. The problem is the usage of: > > > > blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk, > > srp->rq, 1, sg_rq_end_io); > > > > and then doing the sg_finish_rem_req() -> blk_rq_unmap_user() from the > > end_io path, where other users do a sync request and then unmap from the > > same context. > > Right. And only sg does that. I've already converted st and osst to > use the block layer but they works synchronously. Precisely. > > > Hmm. Perhaps we can add some request flag to specify doing > > the completion from user context, then other users could be converted do > > the _nowait() approach as well and get some unification/cleanup there as > > well. > > Since only sg needs this so I simply fixed sg instead of changing the > block layer. But it might be nice if block layer can handle this. > > Seems there are several patches for the block layer (including > mapping) from Tejun and Boaz. I'll read them to see what we could do. > I'm always too busy in March with the company matters. OK, let me know what you find in the scsi tree. I'll hold off on this one. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html