On 6/14/22 5:27 AM, Ming Lei wrote: > On Tue, Jun 14, 2022 at 10:34:53AM +0200, Christoph Hellwig wrote: >> On Tue, Jun 14, 2022 at 04:23:36PM +0800, Ming Lei wrote: >>>> blk_sync_queue(q); >>>> blk_flush_integrity(); >>>> + blk_mq_cancel_work_sync(q); >>>> + >>>> + blk_mq_quiesce_queue(q); >>> >>> quiesce queue adds a bit long delay in del_gendisk, not sure if this way may >>> cause regression in big machines with lots of disks. >> >> It does. But at least we remove a freeze in the queue teardown path. >> But either way I'd really like to get things correct first before >> looking into optimizations. > > The removed one works at atomic mode and it is super fast. > >> >>> >>>> + if (q->elevator) { >>>> + mutex_lock(&q->sysfs_lock); >>>> + elevator_exit(q); >>>> + mutex_unlock(&q->sysfs_lock); >>>> + } >>>> + rq_qos_exit(q); >>>> + blk_mq_unquiesce_queue(q); >>> >>> Also tearing down elevator here has to be carefully, that means any >>> elevator reference has to hold rcu read lock or .q_usage_counter, >>> meantime it has to be checked, otherwise use-after-free may be caused. >> >> This is not a new pattern. We have the same locking here as a >> sysfs-induced change of the elevator to none which also clears >> q->elevator under a queue that is frozen and quiesced. > > Then looks this pattern has problem in dealing with the examples I > mentioned. > > And the elevator usage in __blk_mq_update_nr_hw_queues() looks one > old problem, but easy to fix by protecting it via sysfs_lock. > > And fixing blk_mq_has_sqsched() should be easy too. > > I will send patches later. Just checking in on this series, Ming did you make any progress? -- Jens Axboe