Re: [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 31 Aug 2020 14:25:26 -0400

On Sun, Aug 30, 2020 at 07:53:57PM -0700, Bart Van Assche wrote:
> Recently Martin Kepplinger reported a problem with the SCSI runtime PM
> code. Alan Stern root-caused that deadlock as follows:
> 
> 	Thread 0		Thread 1
> 	--------		--------
> 	Start runtime suspend
> 	blk_pre_runtime_suspend calls
> 	  blk_set_pm_only and sets
> 	  q->rpm_status to RPM_SUSPENDING
> 
> 				Call sd_open -> ... -> scsi_test_unit_ready
> 				  -> __scsi_execute -> ...
> 				  -> blk_queue_enter
> 				Sees BLK_MQ_REQ_PREEMPT set and
> 				  RPM_SUSPENDING queue status, so does
> 				  not postpone the request
> 
> 	blk_post_runtime_suspend sets
> 	  q->rpm_status to RPM_SUSPENDED
> 	The drive goes into runtime suspend
> 
> 				Issues the TEST UNIT READY request
> 				Request fails because the drive is suspended
> 
> Fix that deadlock by only accepting power management requests while
> suspended. Remove flag RQF_PREEMPT.

Let me clarify this description.

Firstly, the second-to-last sentence is ambiguous.  The word "only" is
all too easy to misuse through carelessness.  In this case you meant
to say "by accepting only power management requests while suspended",
but what you actually wrote was equivalent to "by accepting power
management requests only while suspended".  And as it happens, both
meanings are incorrect because we don't want to accept _any_ requests
while the device is suspended -- not even power-management requests.
The description should have said "by postponing all non-power-management 
requests while the device is suspending, suspended, or resuming."

Secondly, the scenario described above is not a deadlock; it is a race 
leading to a command failure.  Namely, the thread setting q->rpm_status 
to RPM_SUSPENDED races with the thread testing q->rpm_status.

Thirdly, this race is _not_ the problem that Martin encountered.  His 
problem was a simple bug (failure to postpone a request), not a race or 
a deadlock.

Fourthly, the race illustrated above is, for now, theoretical.  It 
cannot occur with the existing code base (mostly because the existing 
code is buggy).  The advantage of your series over the patch I submitted 
on Aug. 23 ("block: Fix bug in runtime-resume handling") is that it 
fixes Martin's problem without introducing this new race.

Alan Stern