[RFC] blk-mq/scsi: deadlock found on usb driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, all

  We reported IO stuck on a scsi usb driver recently and any IO issued
to the device cannot return. The usb driver just have **one** driver tag
and  **two** sched tag. After debugging, we found there is a deadlock
race as following:

cpu0(scsi_eh)       cpu1                          cpu2
                    get sched tag(internal_tag=0)
                    get driver tag(tag=0)
                                                  get sched tag(internal_tag=1)
                                                  wait for driver tag
scsi_error_handler try issue io
wait for sched tag
                    try to dispatch the request
                    wait for setting shost state as SHOST_RUNNING
//scsi_host_set_state(shost, SHOST_RUNNING)

The scsi_eh thread stack as following:
PID: 945745  TASK: ffff950a8f2f0000  CPU: 42  COMMAND: "scsi_eh_15"
  [ffffbbee8d5b3ce0] __schedule at ffffffffa506ebac
  [ffffbbee8d5b3d00] sbitmap_get at ffffffffa4c4684f
  [ffffbbee8d5b3d48] schedule at ffffffffa506f208
  [ffffbbee8d5b3d50] io_schedule at ffffffffa506f5d2
  [ffffbbee8d5b3d60] blk_mq_get_tag at ffffffffa4bf5277
  [ffffbbee8d5b3d88] autoremove_wake_function at ffffffffa48ffe40
  [ffffbbee8d5b3db8] autoremove_wake_function at ffffffffa48ffe40
  [ffffbbee8d5b3e08] blk_mq_get_request at ffffffffa4bef14c
  [ffffbbee8d5b3e20] eh_lock_door_done at ffffffffa4da5580
  [ffffbbee8d5b3e38] blk_mq_alloc_request at ffffffffa4bef494
  [ffffbbee8d5b3e80] blk_get_request at ffffffffa4be5042
  [ffffbbee8d5b3e98] scsi_error_handler at ffffffffa4da8670
  [ffffbbee8d5b3ea0] __schedule at ffffffffa506ebb4
  [ffffbbee8d5b3f08] scsi_error_handler at ffffffffa4da8430
  [ffffbbee8d5b3f10] kthread at ffffffffa48d6d7d
  [ffffbbee8d5b3f20] kthread at ffffffffa48d6c70
  [ffffbbee8d5b3f50] ret_from_fork at ffffffffa520023f

Since there are no more available sched tag and driver tag. All of
threads will wait forever. We found the bug on 4.18 kernel, but the
latest kernel code also have the problem.

I don't have good idea about how to fix the bug. So, any suggestions are welcome.

Thanks,
Yufen



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux