Re: block: del_gendisk() vs blk_queue_enter() race condition

Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> · Fri, 4 Oct 2024 23:32:34 +0900

On (24/10/04 05:20), Christoph Hellwig wrote:
> On Fri, Oct 04, 2024 at 04:48:18PM +0900, Sergey Senozhatsky wrote:
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index bc5e8c5eaac9..ccd36cb5ada7 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -292,6 +292,16 @@ void blk_queue_start_drain(struct request_queue *q)
> >  	wake_up_all(&q->mq_freeze_wq);
> >  }
> >  
> > +void blk_queue_disk_dead(struct request_queue *q)
> > +{
> > +	struct gendisk *disk = q->disk;
> > +
> > +	if (WARN_ON_ONCE(!test_bit(GD_DEAD, &disk->state)))
> > +		return;
> > +	/* Make blk_queue_enter() reexamine the GD_DEAD flag. */
> > +	wake_up_all(&q->mq_freeze_wq);
> > +}
> 
> Why is this a separate helper vs just doing the wake_up_all in the
> only caller that sets (with the suggested fixup anyway) GD_DEAD?

It looked to me like whatever happens to ->mq_freeze_wq stays in Las^W
blk-core or blk-mq, so I added a new helper to follow suit, IOW to not
spread ->mq_freeze_wq wakeup across multiple files.

> > +			   blk_queue_dying(q) ||
> > +			   test_bit(GD_DEAD, &disk->state));
> 
> This needs to check for a NULL disk.

Ack.

> And now that I'm looking at the code a bit more this makes me worried
> that checking for q->disk here sounds like a good way to hit a race with
> clearing it.  So I fear we need the other hack variant that sets
> QUEUE_FLAG_DYING unconditionally in __blk_mark_disk_dead and then clears
> it again (for GD_OWNS_QUEUE only) toward the end of del_gendisk.

Hmm, setting QUEUE_FLAG_DYING unconditionally in __blk_mark_disk_dead()
implies moving it up, to the very top of del_gendisk(), before the first
time we grab ->open_mutex, because that's the issue that we are having.
Does this sound like re-introducing the previous deadlock scenario (the
one you pointed at previously) because of that "don't acquire ->open_mutex
after freezing the queue" thing?