Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 16 Jun 2023 15:33:41 +0800

On Fri, Jun 16, 2023 at 09:27:21AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote:
> > > > > Shouldn't those writebacks be unblocked by the existing check in
> > > > > bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
> > > > > disk state update or wakeup on this condition?
> > > > 
> > > > GD_DEAD is only set if the device is really dead, then all pending IO
> > > > will be failed.
> > > 
> > > del_gendisk also sets GD_DEAD early on.
> > 
> > No.
> > 
> > The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
> > bio_queue_enter().
> 
> What is the workload here?  If del_gendisk is called to remove a disk
> that is in perfectly fine state and can do I/O, fsync_bdev should write
> back data, which is what is exists for.  If the disk is dead, we should
> have called blk_mark_disk_dead before.

It is basically that removing ctrl breaks in-progress error recovery,
then queues are left as quiesced and froze.

https://lore.kernel.org/linux-nvme/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@xxxxxxxxxxxxxx/T/#u

https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@xxxxxxxxxx/

Thanks, 
Ming