Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 17 Sep 2021 16:32:15 +0800

On Fri, Sep 17, 2021 at 09:56:50AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 17, 2021 at 03:41:05PM +0800, Ming Lei wrote:
> > >  
> > > -	return ret;
> > > +	if (unlikely(!disk_live(disk))) {
> > > +		blk_queue_exit(disk->queue);
> > > +		bio_io_error(bio);
> > > +		return -ENODEV;
> > > +	}
> > 
> > Is it safe to fail IO in this way? There is still opened bdev, and
> > usually IO can be done successfully even when disk is being deleted.
> 
> "normal" I/O should not really happen by the time it is deleted.  That
> being said we should do this only after the fsync is done. While no
> one should rely on that I'm pretty sure some file systems do.
> So we'll actually need a deleted flag.
> 
> > Not mention it adds one extra check in real fast path.
> 
> I'm not really woried about the check itself.  That being
> sais this inode cache line is not hot right now, so moving it to
> disk->state will help as we need to check the read-only flag in
> the the I/O submission path anyway.

When the deleted flag is set in del_gendisk(), it may not be observed
in time in bio_submit() because of memory/compiler re-order, so the
check is still sort of relax constraint. I guess the same is true for
read-only check.

> 
> > > +	/*
> > > +	 * Prevent new I/O from crossing bio_queue_enter().
> > > +	 */
> > > +	blk_freeze_queue_start(q);
> > > +	if (queue_is_mq(q))
> > > +		blk_mq_wake_waiters(q);
> > > +	/* Make blk_queue_enter() reexamine the DYING flag. */
> > > +	wake_up_all(&q->mq_freeze_wq);
> > > +
> > > +	rq_qos_exit(q);
> > 
> > rq_qos_exit() requires the queue to be frozen, otherwise kernel oops
> > can be triggered. There may be old submitted bios not completed yet,
> > and rq_qos_done() can be referring to q->rq_qos.
> 
> Yes.  I actually misread the old code - it atually does two
> blk_freeze_queue_start, but it also includes the wait.

Only blk_freeze_queue() includes the wait, and blk_freeze_queue_start()
doesn't.

> 
> > But if waiting for queue frozen is added, one extra freeze is added,
> > which will slow done remove disk/queue.
> 
> Is it?  For the typical case the second free in blk_cleanp_queue will
> be bsically free because there is no pending I/O.  The only case
> where that matters if if there is pending passthrough I/O again,
> which can only happen with SCSI, and even there is highly unusual.

RCU grace period is involved in blk_freeze_queue(). One way you can
avoid it is to keep the percpu ref at atomic mode when running
blk_mq_unfreeze_queue() in del_gendisk().

Thanks,
Ming