On Thu, Sep 16, 2021 at 12:14:51PM +0200, Christoph Hellwig wrote: > On Thu, Sep 16, 2021 at 09:36:23AM +0800, Ming Lei wrote: > > >From correctness viewpoint, we need to call blk_cleanup_queue > > before releasing gendisk and after del_gendisk(). Now you have invented > > blk_cleanup_disk(), do you plan to do the three in one helper? :-) > > No. In retrospective blk_cleanup_disk wan't the best idea for a few > reasons. But at least it consolidated some of the code. > > > We don't have to put del_gendisk & blk_cleanup_queue together, > > I don't want all of it together. The important thing is that we have > two different concepts: > > - the gendisk is required to do file system style I/O > - a standalone request_queue can be used for passthrough I/O. request_queue is also abstract in block I/O's implementation, which can be thought as one lower level concept of gendisk too, IMO. > > > and it may cause other trouble at least for scsi disk since sd_shutdown() > > follows del_gendisk() and has to be called before blk_cleanup_queue(). > > Yes. So we need to move the bits of blk_cleanup_queue that deal with > the file system I/O state to del_gendisk, and keep blk_cleanup_queue > for anything actually needed for the low-level queue. Can you explain what the bits are in blk_cleanup_queue() for dealing with FS I/O state? blk_cleanup_queue() drains and shutdown the queue basically, all shouldn't be related with gendisk, and it is fine to implement one queue without gendisk involved, such as nvme admin, connect queue or sort of stuff. Wrt. this reported issue, rq_qos_exit() needs to run before releasing gendisk, but queue has to put into freezing before calling rq_qos_exit(), so looks you suggest to move the following code into del_gendisk()? WARN_ON_ONCE(blk_queue_registered(q)); /* mark @q DYING, no new request or merges will be allowed afterwards */ blk_set_queue_dying(q); blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q); blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q); /* * Drain all requests queued before DYING marking. Set DEAD flag to * prevent that blk_mq_run_hw_queues() accesses the hardware queues * after draining finished. */ blk_freeze_queue(q); rq_qos_exit(q); If we move the above into del_gendisk(), some corner cases have to be taken care of, such as request queue without disk involved. > > To take SCSI as the example. We can unload the sd/sr drivers and the > queue needs to still be around and work for use with the sg driver. > > > BTW, you asked the reproducer of the issue, I just observed the issue > > one or two time when running blktests block/009, but my scsi lifetime > > bpftrace script does show that gendisk is released before blk_cleanup_queue(). > > Interesting. What were the symptoms in this case? It is same with recent report of 'general protection fault in wb_timer_fn'. Thanks, Ming