On Fri, Sep 17, 2021 at 09:56:50AM +0200, Christoph Hellwig wrote: > On Fri, Sep 17, 2021 at 03:41:05PM +0800, Ming Lei wrote: > > > > > > - return ret; > > > + if (unlikely(!disk_live(disk))) { > > > + blk_queue_exit(disk->queue); > > > + bio_io_error(bio); > > > + return -ENODEV; > > > + } > > > > Is it safe to fail IO in this way? There is still opened bdev, and > > usually IO can be done successfully even when disk is being deleted. > > "normal" I/O should not really happen by the time it is deleted. That > being said we should do this only after the fsync is done. While no > one should rely on that I'm pretty sure some file systems do. > So we'll actually need a deleted flag. > > > Not mention it adds one extra check in real fast path. > > I'm not really woried about the check itself. That being > sais this inode cache line is not hot right now, so moving it to > disk->state will help as we need to check the read-only flag in > the the I/O submission path anyway. When the deleted flag is set in del_gendisk(), it may not be observed in time in bio_submit() because of memory/compiler re-order, so the check is still sort of relax constraint. I guess the same is true for read-only check. > > > > + /* > > > + * Prevent new I/O from crossing bio_queue_enter(). > > > + */ > > > + blk_freeze_queue_start(q); > > > + if (queue_is_mq(q)) > > > + blk_mq_wake_waiters(q); > > > + /* Make blk_queue_enter() reexamine the DYING flag. */ > > > + wake_up_all(&q->mq_freeze_wq); > > > + > > > + rq_qos_exit(q); > > > > rq_qos_exit() requires the queue to be frozen, otherwise kernel oops > > can be triggered. There may be old submitted bios not completed yet, > > and rq_qos_done() can be referring to q->rq_qos. > > Yes. I actually misread the old code - it atually does two > blk_freeze_queue_start, but it also includes the wait. Only blk_freeze_queue() includes the wait, and blk_freeze_queue_start() doesn't. > > > But if waiting for queue frozen is added, one extra freeze is added, > > which will slow done remove disk/queue. > > Is it? For the typical case the second free in blk_cleanp_queue will > be bsically free because there is no pending I/O. The only case > where that matters if if there is pending passthrough I/O again, > which can only happen with SCSI, and even there is highly unusual. RCU grace period is involved in blk_freeze_queue(). One way you can avoid it is to keep the percpu ref at atomic mode when running blk_mq_unfreeze_queue() in del_gendisk(). Thanks, Ming