Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

"Ewan D. Milne" <emilne@xxxxxxxxxx> · Wed, 07 Dec 2016 12:40:11 -0500

On Wed, 2016-12-07 at 08:55 -0800, Bart Van Assche wrote:
> On 12/07/2016 08:48 AM, Bart Van Assche wrote:
> > It's a known bug. Some time ago I posted a patch that serializes all
> > scsi_device_set_state() calls but I have not yet found it in the list
> > archives. However, that patch has not yet been merged.
> 
> See also https://www.spinics.net/lists/linux-scsi/msg66966.html.
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yes, however that patch does not fix Wei Fang's issue.  In fact I just
received a crash dump that appears to be the same thing.  It looks like
the rport went away right after the initial INQUIRY, so we set the state
to SDEV_BLOCK and stop the queue, and then the scan code continues and
sets the state back to SDEV_RUNNING.  Then, when the devloss timer
expires, we call scsi_target_unblock w/SDEV_TRANSPORT_OFFLINE, but the
SDEV_RUNNING state prevents the queue from being restarted, so a
subsequent command (i.e. the ALUA page 83 inquiry command) is stuck on
the stopped queue.  (The dump shows 3 devices on the target with queues
running in SDEV_TRANSPORT_OFFLINE, and 1 device currently being scanned
with the queue stopped in SDEV_RUNNING.)

It seems to me the problem is that scsi_device_set_state() is allowing
the caller to transition SDEV_BLOCK -> SDEV_RUNNING without actually
restarting the queue and that should be an illegal transition.

-Ewan

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html