block: del_gendisk() vs blk_queue_enter() race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm looking at a report from the fleet (don't have a reproducer)
and wondering what you and block folks might think / suggest.

The problem is basically as follows

CPU0

do_syscall
 sys_close
  __fput
   blkdev_release
    blkdev_put              grabs ->open_mutex
     sr_block_release
      scsi_set_medium_removal
       ioctl_internal_command
        scsi_execute_cmd
         scsi_alloc_request
          blk_mq_alloc_request
           blk_queue_enter
            schedule

at the same time:

CPU1

usb_disconnect
 usb_disable_device
  device_del
   usb_unbind_interface
    usb_stor_disconnect
     scsi_remove_host
      scsi_forget_host
       __scsi_remove_device
        device_del
         bus_remove_device
          device_release_driver_internal
           sr_remove
            del_gendisk
             mutex_lock     attempts to grab ->open_mutex
              schedule

blk_queue_enter() sleeps forever, under ->open_mutex, there is no
way for it to be woken up and to detect blk_queue_dying().  del_gendisk()
sleeps forever because it attempts to grab ->open_mutex before it calls
__blk_mark_disk_dead(), which would mark the queue QUEUE_FLAG_DYING and
wake up ->mq_freeze_wq (which is blk_queue_enter() in this case).

I wonder how to fix it.  My current "patch" is to set QUEUE_FLAG_DYING
and "kick" ->mq_freeze_wq early on in del_gendisk(), before it attempts
to grab ->open_mutex for the first time.

Any suggestions?




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux