On Sat, Aug 29, 2020 at 06:33:26PM +0200, Martin Kepplinger wrote: > On 29.08.20 17:26, Alan Stern wrote: > > Hmmm. I'm wondering about something you wrote back in June > > (https://marc.info/?l=linux-scsi&m=159345778431615&w=2): > > > > blk_queue_enter() always - especially when sd is runtime > > suspended and I try to mount as above - sets success to be true > > for me, so never continues down to bkl_pm_request_resume(). All > > I see is "PM: Removing info for No Bus:sda1". > > > > blk_queue_enter() would always set success to be true because pm > > (derived from the BLK_MQ_REQ_PREEMPT flag) is true. But why was the > > BLK_MQ_REQ_PREEMPT flag set? In other words, where was > > blk_queue_enter() called from? > > > > Can you get a stack trace (i.e., call dump_stack()) at exactly this > > point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED? Or > > do you already know the answer? > > > > > > I reverted any scsi/block out-of-tree fixes for this. > > when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the > first stack trace I get in this condition, inside of blk_queue_enter(): > > There is more, but I don't know if that's interesting. > > [ 38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487 > [ 38.642207] Hardware name: Purism Librem 5r3 (DT) > [ 38.642213] Call trace: > [ 38.642233] dump_backtrace+0x0/0x210 > [ 38.642242] show_stack+0x20/0x30 > [ 38.642252] dump_stack+0xc8/0x128 > [ 38.642262] blk_queue_enter+0x1b8/0x2d8 > [ 38.642271] blk_mq_alloc_request+0x54/0xb0 > [ 38.642277] blk_get_request+0x34/0x78 > [ 38.642286] __scsi_execute+0x60/0x1c8 > [ 38.642291] scsi_test_unit_ready+0x88/0x118 > [ 38.642298] sd_check_events+0x110/0x158 > [ 38.642306] disk_check_events+0x68/0x188 > [ 38.642312] disk_clear_events+0x84/0x198 > [ 38.642320] check_disk_change+0x38/0x90 > [ 38.642325] sd_open+0x60/0x148 > [ 38.642330] __blkdev_get+0xcc/0x4c8 > [ 38.642335] __blkdev_get+0x278/0x4c8 > [ 38.642339] blkdev_get+0x128/0x1a8 > [ 38.642345] blkdev_open+0x98/0xb0 > [ 38.642354] do_dentry_open+0x130/0x3c8 > [ 38.642359] vfs_open+0x34/0x40 > [ 38.642366] path_openat+0xa30/0xe40 > [ 38.642372] do_filp_open+0x84/0x100 > [ 38.642377] do_sys_openat2+0x1f4/0x2b0 > [ 38.642382] do_sys_open+0x60/0xa8 > (...) > > and of course it doesn't work and /dev/sda1 disappears, see the initial > discussion that led to your fix. Great! That's exactly what I was looking for, thank you. Bart, this is a perfect example of the potential race I've been talking about in the other email thread. Suppose thread 0 is carrying out a runtime suspend of a SCSI disk and at the same time, thread 1 is opening the disk's block device (as we see in the stack trace here). Then we could have the following: Thread 0 Thread 1 -------- -------- Start runtime suspend blk_pre_runtime_suspend calls blk_set_pm_only and sets q->rpm_status to RPM_SUSPENDING Call sd_open -> ... -> scsi_test_unit_ready -> __scsi_execute -> ... -> blk_queue_enter Sees BLK_MQ_REQ_PREEMPT set and RPM_SUSPENDING queue status, so does not postpone the request blk_post_runtime_suspend sets q->rpm_status to RPM_SUSPENDED The drive goes into runtime suspend Issues the TEST UNIT READY request Request fails because the drive is suspended One way to avoid this race is mutual exclusion: We could make sd_open prevent the drive from being runtime suspended until it returns. However I don't like this approach; it would mean tracking down every possible pathway to __scsi_execute and making sure that runtime suspend is blocked. A more fine-grained approach would be to have __scsi_execute itself call scsi_autopm_get/put_device whenever the rq_flags argument doesn't contain RQF_PM. This way we wouldn't have to worry about missing any possiible pathways. But it relies on an implicit assumption that __scsi_execute is the only place where the PREEMPT flag gets set. A third possibility is the approach I outlined before, adding a BLK_MQ_REQ_PM flag. But to avoid the deadlock you pointed out, I would make blk_queue_enter smarter about whether to postpone a request. The logic would go like this: If !blk_queue_pm_only: Allow If !BLK_MQ_REQ_PREEMPT: Postpone If q->rpm_status is RPM_ACTIVE: Allow If !BLK_MQ_REQ_PM: Postpone If q->rpm_status is RPM_SUSPENDED: Postpone Else: Allow The assumption here is that the PREEMPT flag is set whenever the PM flag is. I believe either the second or third possibility would work. The second looks to be the simplest What do you think? Alan Stern