Re: [PATCH 2/2] Revert "ublk_drv: fix request queue leak"

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 20 Jul 2022 15:47:27 +0800

On Wed, Jul 20, 2022 at 08:07:05AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 19, 2022 at 08:37:23PM +0800, Ming Lei wrote:
> > This change will break START_DEV/STOP_DEV, which is supposed to run
> > multiple cycles after the device is added, especially this way can
> > help to implement error recovery from userside, such as one ubq_daemon
> > is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV
> > commands again after new ubq_daemon is setup.
> 
> What is broken in START_DEV/STOP_DEV?  Please explain the semantics you
> want and what doesn't work.  FYI, there is nothing in the test suite the
> complains.  And besides the obvious block layer bug that Jens found you
> seemed to be perfectly happy with the semantics.

START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if 
GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in
del_gendisk(), then the following START_DEV will stuck.

> 
> > So here we do need separated request_queue/disk, and the model is
> > similar with scsi's, in which disk rebind needs to be supported
> > and GD_OWNS_QUEUE can't be set.
> 
> SCSI needs it because it needs the request_queue to probe for what ULP
> to bind to, and it allows to unbind the ULP.  None of that is the case
> here.  And managing the lifetimes separately is a complete mess, so
> don't do it.  Especially not in a virtual driver where you don't have
> to cater to a long set protocol like SCSI.

If blk_mq_exit_queue is called in del_gendisk() for scsi, how can
re-bind work as expected since it needs one completely workable
request queue instead of partial exited one?

Thanks,
Ming