Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 17 Sep 2021 15:41:05 +0800

On Fri, Sep 17, 2021 at 08:53:05AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 17, 2021 at 11:39:48AM +0800, Ming Lei wrote:
> > If we do that, q->disk is really unnecessary, so looks the fix of
> > 'd152c682f03c block: add an explicit ->disk backpointer to the request_queue'
> > isn't good.
> > The original issue added in 'edb0872f44ec block: move the
> > bdi from the request_queue to the gendisk' can be fixed simply by moving
> > the two lines code in blk_unregister_queue() to blk_cleanup_queue():
> > 
> >         kobject_uevent(&q->kobj, KOBJ_REMOVE);
> >         kobject_del(&q->kobj);
> 
> Well, it is still useful to not do the device model dance, especially as
> uses of queue->disk will grow in 5.16.

It isn't an device model dance, since 

> 
> Anyway, can you give this patch a spin?  I've done some basic testing
> including nvme surprise removal locally.  Note that just like
> blk_cleanup_queue this doesn't actually wait for the freeze to finish.
> Eventually we should do that, but I'm sure this will show tons of issues
> with drivers not properly cleaning up.
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5454db2fa263b..18287c443ae81 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -49,7 +49,6 @@
>  #include "blk-mq.h"
>  #include "blk-mq-sched.h"
>  #include "blk-pm.h"
> -#include "blk-rq-qos.h"
>  
>  struct dentry *blk_debugfs_root;
>  
> @@ -385,13 +384,8 @@ void blk_cleanup_queue(struct request_queue *q)
>  	 */
>  	blk_freeze_queue(q);
>  
> -	rq_qos_exit(q);
> -
>  	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>  
> -	/* for synchronous bio-based driver finish in-flight integrity i/o */
> -	blk_flush_integrity();
> -
>  	blk_sync_queue(q);
>  	if (queue_is_mq(q))
>  		blk_mq_exit_queue(q);
> @@ -470,19 +464,26 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  
>  static inline int bio_queue_enter(struct bio *bio)
>  {
> -	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +	struct gendisk *disk = bio->bi_bdev->bd_disk;
>  	bool nowait = bio->bi_opf & REQ_NOWAIT;
>  	int ret;
>  
> -	ret = blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0);
> +	ret = blk_queue_enter(disk->queue, nowait ? BLK_MQ_REQ_NOWAIT : 0);
>  	if (unlikely(ret)) {
> -		if (nowait && !blk_queue_dying(q))
> +		if (nowait && !blk_queue_dying(disk->queue))
>  			bio_wouldblock_error(bio);
>  		else
>  			bio_io_error(bio);
> +		return ret;
>  	}
>  
> -	return ret;
> +	if (unlikely(!disk_live(disk))) {
> +		blk_queue_exit(disk->queue);
> +		bio_io_error(bio);
> +		return -ENODEV;
> +	}

Is it safe to fail IO in this way? There is still opened bdev, and
usually IO can be done successfully even when disk is being deleted.

Not mention it adds one extra check in real fast path.

> +
> +	return 0;
>  }
>  
>  void blk_queue_exit(struct request_queue *q)
> diff --git a/block/genhd.c b/block/genhd.c
> index 7b6e5e1cf9564..8abe12dca76f2 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -26,6 +26,7 @@
>  #include <linux/badblocks.h>
>  
>  #include "blk.h"
> +#include "blk-rq-qos.h"
>  
>  static struct kobject *block_depr;
>  
> @@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
>   */
>  void del_gendisk(struct gendisk *disk)
>  {
> +	struct request_queue *q = disk->queue;
> +
>  	might_sleep();
>  
>  	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
> @@ -572,6 +575,21 @@ void del_gendisk(struct gendisk *disk)
>  	blk_drop_partitions(disk);
>  	mutex_unlock(&disk->open_mutex);
>  
> +	/*
> +	 * Prevent new I/O from crossing bio_queue_enter().
> +	 */
> +	blk_freeze_queue_start(q);
> +	if (queue_is_mq(q))
> +		blk_mq_wake_waiters(q);
> +	/* Make blk_queue_enter() reexamine the DYING flag. */
> +	wake_up_all(&q->mq_freeze_wq);
> +
> +	rq_qos_exit(q);

rq_qos_exit() requires the queue to be frozen, otherwise kernel oops
can be triggered. There may be old submitted bios not completed yet,
and rq_qos_done() can be referring to q->rq_qos.

But if waiting for queue frozen is added, one extra freeze is added,
which will slow done remove disk/queue.

Thanks,
Ming