On Wed, Jul 20, 2022 at 08:07:05AM +0200, Christoph Hellwig wrote: > On Tue, Jul 19, 2022 at 08:37:23PM +0800, Ming Lei wrote: > > This change will break START_DEV/STOP_DEV, which is supposed to run > > multiple cycles after the device is added, especially this way can > > help to implement error recovery from userside, such as one ubq_daemon > > is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV > > commands again after new ubq_daemon is setup. > > What is broken in START_DEV/STOP_DEV? Please explain the semantics you > want and what doesn't work. FYI, there is nothing in the test suite the > complains. And besides the obvious block layer bug that Jens found you > seemed to be perfectly happy with the semantics. START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in del_gendisk(), then the following START_DEV will stuck. > > > So here we do need separated request_queue/disk, and the model is > > similar with scsi's, in which disk rebind needs to be supported > > and GD_OWNS_QUEUE can't be set. > > SCSI needs it because it needs the request_queue to probe for what ULP > to bind to, and it allows to unbind the ULP. None of that is the case > here. And managing the lifetimes separately is a complete mess, so > don't do it. Especially not in a virtual driver where you don't have > to cater to a long set protocol like SCSI. If blk_mq_exit_queue is called in del_gendisk() for scsi, how can re-bind work as expected since it needs one completely workable request queue instead of partial exited one? Thanks, Ming