Re: [Bug 12945] New: SCSI Generic (sg): BUG: sleeping function called from invalid context

Jens Axboe <jens.axboe@xxxxxxxxxx> · Fri, 27 Mar 2009 07:57:27 +0100

On Fri, Mar 27 2009, FUJITA Tomonori wrote:
> On Thu, 26 Mar 2009 19:43:02 +0100
> Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> 
> > On Thu, Mar 26 2009, Andrew Morton wrote:
> > > 
> > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> > > 
> > > On Thu, 26 Mar 2009 12:27:53 GMT bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > > 
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=12945
> > > > 
> > > >            Summary: SCSI Generic (sg): BUG: sleeping function called from
> > > >                     invalid context
> > > >            Product: SCSI Drivers
> > > >            Version: 2.5
> > > >     Kernel Version: 2.6.28.9
> > > >           Platform: All
> > > >         OS/Version: Linux
> > > >               Tree: Mainline
> > > >             Status: NEW
> > > >           Severity: normal
> > > >           Priority: P1
> > > >          Component: Other
> > > >         AssignedTo: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx
> > > >         ReportedBy: txtoxtox285@xxxxxxxxxxxxxx
> > > >         Regression: No
> > > > 
> > > > 
> > > > Created an attachment (id=20685)
> > > >  --> (http://bugzilla.kernel.org/attachment.cgi?id=20685)
> > > > Stack trace on program kill (2.6.28.9)
> > > > 
> > > > I am experimenting with CD audio extraction. I use the SCSI Generic driver for
> > > > this.
> > > > 
> > > > My test program uses read() and write() (instead of ioctl) to send requests to
> > > > the driver and receive responses. I use SG_FLAG_DIRECT_IO.
> > > > 
> > > > When I kill my program (because I don't want to wait until it has ripped the
> > > > entire CD), I am often rewarded with messages like "BUG: sleeping function
> > > > called from invalid context at linux-2.6.28.9/include/linux/pagemap.h:347". I
> > > > have attached typical stack trace.
> > > > 
> > > > Another case when I hit this BUG is when I set a time out and the CD drive
> > > > doesn't respond fast enough. A stack trace is attached.
> > > 
> > > > [34215.786870] BUG: sleeping function called from invalid context at /mnt/var-pub/src/linux-2.6.28.9/include/linux/pagemap.h:347
> > > > [34215.786880] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper
> > > > [34215.786886] Pid: 0, comm: swapper Not tainted 2.6.28.9 #1
> > > > [34215.786890] Call Trace:
> > > > [34215.786894]  <IRQ>  [<ffffffff8026c4cc>] set_page_dirty_lock+0x1a/0x45
> > > > [34215.786911]  [<ffffffff802ae17d>] bio_unmap_user+0x1e/0x4a
> > > > [34215.786920]  [<ffffffff802e876b>] __blk_rq_unmap_user+0x14/0x20
> > > > [34215.786928]  [<ffffffff80210852>] pit_next_event+0x2e/0x49
> > > > [34215.786934]  [<ffffffff802e8795>] blk_rq_unmap_user+0x1e/0x4b
> > > > [34215.786965]  [<ffffffffa0163475>] sg_finish_rem_req+0x6d/0x88 [sg]
> > > > [34215.786979]  [<ffffffffa0164ef3>] sg_rq_end_io+0x131/0x205 [sg]
> > > > [34215.786986]  [<ffffffff802e5c1f>] end_that_request_last+0x58/0x194
> > > > [34215.786992]  [<ffffffff802e5e00>] blk_end_io+0x48/0x7d
> > > > [34215.787019]  [<ffffffffa0026bef>] scsi_next_command+0x219/0x283 [scsi_mod]
> > > > [34215.787039]  [<ffffffffa00279b1>] scsi_io_completion+0x181/0x53b [scsi_mod]
> > > > [34215.787047]  [<ffffffff802e9737>] blk_done_softirq+0x5f/0x6d
> > > > [34215.787054]  [<ffffffff80230787>] __do_softirq+0x5e/0xf8
> > > > [34215.787061]  [<ffffffff8020ca8c>] call_softirq+0x1c/0x28
> > > > [34215.787067]  [<ffffffff8020d6bc>] do_softirq+0x2c/0x68
> > > > [34215.787073]  [<ffffffff80230696>] irq_exit+0x36/0x82
> > > > [34215.787079]  [<ffffffff8020d79e>] do_IRQ+0xa6/0xb8
> > > > [34215.787085]  [<ffffffff8020c256>] ret_from_intr+0x0/0xa
> > > > [34215.787088]  <EOI>  [<ffffffff8034f648>] menu_reflect+0x0/0x6d
> > > > [34215.787112]  [<ffffffffa0147d51>] acpi_idle_enter_simple+0x170/0x1d6 [processor]
> > > > [34215.787127]  [<ffffffffa0147d47>] acpi_idle_enter_simple+0x166/0x1d6 [processor]
> > > > [34215.787134]  [<ffffffff8034eb32>] cpuidle_idle_call+0x73/0xb1
> > > > [34215.787140]  [<ffffffff8020ac2a>] cpu_idle+0x3c/0x73
> > > 
> > > Argh.  sg_finish_rem_req() is called from interrupt context.  But
> > > blk_rq_unmap_user() can run
> > > __bio_unmap_user()->set_page_dirty_lock()->lock_page(), which can call
> > > schedule().  If it does call schedule(), the machine will crash.
> > > 
> > > afacit, blk_rq_unmap_user() has always been a can-sleep function, and
> > > this is a regression caused by
> > > 
> > > commit 6e5a30cba5e7c03b2cd564e968f1dd667a0f7c42
> > 
> > Yep, it is. The problem is the usage of:
> > 
> >         blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
> >                               srp->rq, 1, sg_rq_end_io);
> > 
> > and then doing the sg_finish_rem_req() -> blk_rq_unmap_user() from the
> > end_io path, where other users do a sync request and then unmap from the
> > same context.
> 
> Right. And only sg does that. I've already converted st and osst to
> use the block layer but they works synchronously.

Precisely.

> 
> > Hmm. Perhaps we can add some request flag to specify doing
> > the completion from user context, then other users could be converted do
> > the _nowait() approach as well and get some unification/cleanup there as
> > well.
> 
> Since only sg needs this so I simply fixed sg instead of changing the
> block layer. But it might be nice if block layer can handle this.
> 
> Seems there are several patches for the block layer (including
> mapping) from Tejun and Boaz. I'll read them to see what we could do.
> I'm always too busy in March with the company matters.

OK, let me know what you find in the scsi tree. I'll hold off on this
one.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html