Re: [PATCH 2/2] scsi: Fix hang when device state is set via sysfs

James Bottomley <jejb@xxxxxxxxxxxxx> · Fri, 19 Nov 2021 13:19:37 -0500

On Fri, 2021-11-19 at 11:13 -0600, Mike Christie wrote:
> On 11/19/21 7:35 AM, James Bottomley wrote:
> > On Fri, 2021-11-05 at 17:10 -0500, Mike Christie wrote:
> > > This fixes a regression added with:
> > > 
> > > commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> > > offlinining device")
> > > 
> > > The problem is that after iSCSI recovery, iscsid will call into
> > > the kernel to set the dev's state to running, and with that patch
> > > we now call scsi_rescan_device with the state_mutex held. If the
> > > scsi error handler thread is just starting to test the device in
> > > scsi_send_eh_cmnd then it's going to try to grab the state_mutex.
> > > 
> > > We are then stuck, because when scsi_rescan_device tries to send
> > > its IO scsi_queue_rq calls -> scsi_host_queue_ready ->
> > > scsi_host_in_recovery which will return true (the host state is
> > > still in recovery) and IO will just be requeued.
> > > scsi_send_eh_cmnd will then never be able to grab the state_mutex
> > > to finish error handling.
> > > 
> > > To prevent the deadlock this moves the rescan related code to
> > > after we drop the state_mutex.
> > > 
> > > This also adds a check for if we are already in the running
> > > state. This prevents extra scans and helps the iscsid case where
> > > if the transport class has already onlined the device during it's
> > > recovery process then we don't need userspace to do it again plus
> > > possibly block that daemon.
> > > 
> > > Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> > > offlinining device")
> > > Cc: Bart Van Assche <bvanassche@xxxxxxx>
> > > Cc: lijinlin <lijinlin3@xxxxxxxxxx>
> > > Cc: Wu Bo <wubo40@xxxxxxxxxx>
> > > Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
> > > ---
> > >  drivers/scsi/scsi_sysfs.c | 30 +++++++++++++++++++-----------
> > >  1 file changed, 19 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/scsi/scsi_sysfs.c
> > > b/drivers/scsi/scsi_sysfs.c
> > > index a35841b34bfd..53e23a7bc0d3 100644
> > > --- a/drivers/scsi/scsi_sysfs.c
> > > +++ b/drivers/scsi/scsi_sysfs.c
> > > @@ -797,6 +797,7 @@ store_state_field(struct device *dev, struct
> > > device_attribute *attr,
> > >  	int i, ret;
> > >  	struct scsi_device *sdev = to_scsi_device(dev);
> > >  	enum scsi_device_state state = 0;
> > > +	bool rescan_dev = false;
> > >  
> > >  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
> > >  		const int len = strlen(sdev_states[i].name);
> > > @@ -815,20 +816,27 @@ store_state_field(struct device *dev,
> > > struct
> > > device_attribute *attr,
> > >  	}
> > >  
> > >  	mutex_lock(&sdev->state_mutex);
> > > -	ret = scsi_device_set_state(sdev, state);
> > > -	/*
> > > -	 * If the device state changes to SDEV_RUNNING, we need to
> > > -	 * run the queue to avoid I/O hang, and rescan the device
> > > -	 * to revalidate it. Running the queue first is necessary
> > > -	 * because another thread may be waiting inside
> > > -	 * blk_mq_freeze_queue_wait() and because that call may be
> > > -	 * waiting for pending I/O to finish.
> > > -	 */
> > > -	if (ret == 0 && state == SDEV_RUNNING) {
> > > +	if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING)
> > > {
> > > +		ret = count;
> > 
> > This looks wrong because of this
> > 
> > [...]
> > >  	return ret == 0 ? count : -EINVAL;
> > 
> > Don't we now return EINVAL on idempotent set state running because
> > the count is always non-zero?
> > 
> > I think the first statement should be 'ret = 0;' instead to cause
> > idempotent state setting to succeed as a nop.
> > 
> 
> You're right.
> 
> I'll resend this patchset with James's comment handled.

Send an incremental patch because this is already in the queue for
Linus and he'd go ballistic if we had to rebase to remove it.

James