Re: block: del_gendisk() vs blk_queue_enter() race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 03, 2024 at 06:36:20AM -0700, Christoph Hellwig wrote:
> On Thu, Oct 03, 2024 at 05:56:10PM +0900, Sergey Senozhatsky wrote:
> > blk_queue_enter() sleeps forever, under ->open_mutex, there is no
> > way for it to be woken up and to detect blk_queue_dying().  del_gendisk()
> > sleeps forever because it attempts to grab ->open_mutex before it calls
> > __blk_mark_disk_dead(), which would mark the queue QUEUE_FLAG_DYING and
> > wake up ->mq_freeze_wq (which is blk_queue_enter() in this case).
> > 
> > I wonder how to fix it.  My current "patch" is to set QUEUE_FLAG_DYING
> > and "kick" ->mq_freeze_wq early on in del_gendisk(), before it attempts
> > to grab ->open_mutex for the first time.
> 
> We split blk_queue_enter further to distinguish between file system
> requests and passthrough ones.
> 
> The file system request should be using bio_queue_enter, which only
> checks GD_DEAD, instead of QUEUE_FLAG_DYING.  Passthrough requests like
> the cdrom door lock are using blk_queue_enter that checks QUEUE_FLAG_DYING
> which only gets set in blk_mq_destroy_queue.
> 
> So AFAICS your trace should not happen with the current kernel, but
> probably could happen with older stable version unless I'm missing
> something.  What kernel version did you see this on?

.. actually, we still clear QUEUE_FLAG_DYING early.  Something like
the pathc below (plus proper comments) should sort it out:

diff --git a/block/genhd.c b/block/genhd.c
index 1c05dd4c6980b5..9a1e18fbb136cf 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -589,9 +589,6 @@ static void __blk_mark_disk_dead(struct gendisk *disk)
 	if (test_and_set_bit(GD_DEAD, &disk->state))
 		return;
 
-	if (test_bit(GD_OWNS_QUEUE, &disk->state))
-		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
-
 	/*
 	 * Stop buffered writers from dirtying pages that can't be written out.
 	 */
@@ -673,6 +670,9 @@ void del_gendisk(struct gendisk *disk)
 		drop_partition(part);
 	mutex_unlock(&disk->open_mutex);
 
+	if (test_bit(GD_OWNS_QUEUE, &disk->state))
+		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
+
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux