On Sun, Aug 30, 2020 at 07:53:57PM -0700, Bart Van Assche wrote: > Recently Martin Kepplinger reported a problem with the SCSI runtime PM > code. Alan Stern root-caused that deadlock as follows: > > Thread 0 Thread 1 > -------- -------- > Start runtime suspend > blk_pre_runtime_suspend calls > blk_set_pm_only and sets > q->rpm_status to RPM_SUSPENDING > > Call sd_open -> ... -> scsi_test_unit_ready > -> __scsi_execute -> ... > -> blk_queue_enter > Sees BLK_MQ_REQ_PREEMPT set and > RPM_SUSPENDING queue status, so does > not postpone the request > > blk_post_runtime_suspend sets > q->rpm_status to RPM_SUSPENDED > The drive goes into runtime suspend > > Issues the TEST UNIT READY request > Request fails because the drive is suspended > > Fix that deadlock by only accepting power management requests while > suspended. Remove flag RQF_PREEMPT. Let me clarify this description. Firstly, the second-to-last sentence is ambiguous. The word "only" is all too easy to misuse through carelessness. In this case you meant to say "by accepting only power management requests while suspended", but what you actually wrote was equivalent to "by accepting power management requests only while suspended". And as it happens, both meanings are incorrect because we don't want to accept _any_ requests while the device is suspended -- not even power-management requests. The description should have said "by postponing all non-power-management requests while the device is suspending, suspended, or resuming." Secondly, the scenario described above is not a deadlock; it is a race leading to a command failure. Namely, the thread setting q->rpm_status to RPM_SUSPENDED races with the thread testing q->rpm_status. Thirdly, this race is _not_ the problem that Martin encountered. His problem was a simple bug (failure to postpone a request), not a race or a deadlock. Fourthly, the race illustrated above is, for now, theoretical. It cannot occur with the existing code base (mostly because the existing code is buggy). The advantage of your series over the patch I submitted on Aug. 23 ("block: Fix bug in runtime-resume handling") is that it fixes Martin's problem without introducing this new race. Alan Stern