RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Apr 2019 Kento.A.Kobayashi@xxxxxxxx wrote:

> Hi,
> 
> >> Root Cause
> >> - Block layer timeout happens after power off UAS USB device which is accessed as reproduce step. During timeout error handler process, scsi host state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot be released. And in final, usb subsystem hangs up.
> >> Follow is function call:
> >> blk_mq_timeout_work 
> >>   …->scsi_times_out  (… means some functions are not listed before this function.)
> >>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
> >>       … -> scsi_error_handler
> >>         …-> uas_eh_device_reset_handler
> >>             -> usb_lock_device_for_reset  <- take lock
> >>               -> usb_reset_device
> >>                 …-> rebind = uas_post_reset (return 1 since ENODEV) 
> >>                 …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
> >>                    …-> uas_disconnect  (scsi_host_set_state to SHOST_CANCEL_RECOVERY)
> >>                         … -> scsi_queue_rq
  >>                              -> scsi_host_queue_ready(return 0 causes IO hangs up.)
> >
> >How does scsi_queue_rq get called here?  As far as I can see, this shouldn't happen.
> 
> We confirmed the function call path on linux 4.9 when this problem occured since we are working on it. In linux 4.9, the last function is scsi_request_fn instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called by follow path.
> uas_disconnect
> |- scsi_remove_host
>  |- scsi_forget_host
>   |- __scsi_remove_device
>    |- device_del
>     |- bus_remove_device
>      |- device_release_driver
>       |- device_release_driver_internal
>        |- __device_release_driver
>         |- drv->remove(dev) (sd_remove)  
>          |- sd_shutdown
>           |- sd_sync_cache
>            |- scsi_execute
... (unnecessary internal details elided)
>                     |- blk_mq_dispatch_rq_list
>                      |- q->mq_ops->queue_rq (scsi_queue_rq)

So it looks as though the SCSI subsystem doesn't like to have a reset 
handler call scsi_remove_host.  Commands dispatched by the removal 
routines are forced to wait for the reset recovery to finish, which 
won't happen until those commands have been completed.

Is this a bug in the SCSI core?  If not, we need to know what is the
right way to do things when a reset handler detects that the SCSI host
has been hot-unplugged.

James, Martin, any suggestions?

Alan Stern




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux