On Fri, Jun 16, 2023 at 09:27:21AM +0200, Christoph Hellwig wrote: > On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote: > > > > > Shouldn't those writebacks be unblocked by the existing check in > > > > > bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a > > > > > disk state update or wakeup on this condition? > > > > > > > > GD_DEAD is only set if the device is really dead, then all pending IO > > > > will be failed. > > > > > > del_gendisk also sets GD_DEAD early on. > > > > No. > > > > The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on > > bio_queue_enter(). > > What is the workload here? If del_gendisk is called to remove a disk > that is in perfectly fine state and can do I/O, fsync_bdev should write > back data, which is what is exists for. If the disk is dead, we should > have called blk_mark_disk_dead before. It is basically that removing ctrl breaks in-progress error recovery, then queues are left as quiesced and froze. https://lore.kernel.org/linux-nvme/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@xxxxxxxxxxxxxx/T/#u https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@xxxxxxxxxx/ Thanks, Ming