Re: SCSI layer RPM deadlock debug suggestion

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Fri, 2 Jul 2021 16:31:42 -0400

On Fri, Jul 02, 2021 at 06:03:20PM +0100, John Garry wrote:
> Hi guys,
> 
> We're experiencing a deadlock between trying to remove a SATA device and
> doing a rescan in scsi_rescan_device().
> 
> I'm just looking for a suggestion on how to solve.
> 
> The background is that the host (hisi sas v3 hw) uses SAS SCSI transport and
> supports RPM. In the testcase, the host and disks are put to suspend. Then
> we run fio on the disk to make them active and then immediately hard reset
> the disk link, which causes the disk to be disconnected (please don't ask
> why ...).
> 
> We find that there is a conflict between the rescan and the device removal
> code, resulting in a deadlock:

> The rescan holds the sdev_gendev.device lock in scsi_rescan_device(), while
> the removal code in __scsi_remove_device() wants to grab it.
> 
> However the rescan will not release (the lock) until the blk_queue_enter()
> succeeds, above. That can happen 2x ways:
> 
> - the queue is dying, which would not happen until after the device_del() in
> __scsi_remove_device(), so not going to happen
> 
> - q->pm_only falls to 0. This would be when scsi_runtime_resume() ->
> sdev_runtime_resume() -> blk_post_runtime_resume(err = 0) ->
> blk_set_runtime_active() is called. However, I find that the err argument
> for me is -5, which comes from sdev_runtime_resume() -> pm->runtime_resume
> (=sd_resume()), which fails. That sd_resume() -> sd_start_stop_device()
> fails as the disk is not attached. So we go into error state:
> 
> $:more /sys/devices/pci0000:b4/0000:b4:04.0/host3/port-3:0/end_device-3:0/target3:0:0/3:0:0:0/power/runtime_status
> error
> 
> Removing commit e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
> solves this issue for me, but that is there for a reason.
> 
> Any suggestion on how to fix this deadlock?

This is indeed a tricky question.  It seems like we should allow a 
runtime resume to succeed if the only reason it failed was that the 
device has been removed.

More generally, perhaps we should always consider that a runtime 
resume succeeds.  Any remaining problems will be dealt with by the 
device's driver and subsystem once the device is marked as 
runtime-active again.

Suppose you try changing blk_post_runtime_resume() so that it always 
calls blk_set_runtime_active() regardless of the value of err.  Does 
that fix the problem?

And more importantly, will it cause any other problems...?

Alan Stern