Re: block: del_gendisk() vs blk_queue_enter() race condition

YangYang <yang.yang@xxxxxxxx> · Tue, 8 Oct 2024 12:02:06 +0800

On 2024/10/3 16:56, Sergey Senozhatsky wrote:
Hello,

I'm looking at a report from the fleet (don't have a reproducer)
and wondering what you and block folks might think / suggest.

The problem is basically as follows

CPU0

do_syscall
  sys_close
   __fput
    blkdev_release
     blkdev_put              grabs ->open_mutex
      sr_block_release
       scsi_set_medium_removal
        ioctl_internal_command
         scsi_execute_cmd
          scsi_alloc_request
           blk_mq_alloc_request
            blk_queue_enter
             schedule

at the same time:

CPU1

usb_disconnect
  usb_disable_device
   device_del
    usb_unbind_interface
     usb_stor_disconnect
      scsi_remove_host
       scsi_forget_host
        __scsi_remove_device
         device_del
          bus_remove_device
           device_release_driver_internal
            sr_remove
             del_gendisk
              mutex_lock     attempts to grab ->open_mutex
               schedule

I'm a little confused here. How is the queue getting frozen in this 
scenario? Normally, the queue should be frozen by 
__blk_mark_disk_dead()->blk_queue_start_drain()->blk_freeze_queue_start(), 
and this cannot occur without grabbing ->open_mutex.

 670     mutex_lock(&disk->open_mutex);
 671     __blk_mark_disk_dead(disk);
 672     xa_for_each_start(&disk->part_tbl, idx, part, 1)
 673         drop_partition(part);
 674     mutex_unlock(&disk->open_mutex);