On Wed, Sep 15, 2021 at 03:40:08PM +0200, Christoph Hellwig wrote: > On Wed, Sep 15, 2021 at 05:25:47PM +0800, Ming Lei wrote: > > gendisk instance has to be released after request queue is cleaned up > > because bdi is referred from gendisk since commit edb0872f44ec ("block: > > move the bdi from the request_queue to the gendisk"). > > > > For sd and sr, gendisk can be removed in the release handler(sd_remove/ > > sr_remove) of sdev->sdev_gendev, which is triggered in device_del(sdev->sdev_gendev) > > in __scsi_remove_device(), when the request queue isn't cleaned up yet. > > > > So kernel oops could be triggered when referring bdi via gendisk. > > > > Fix the issue by moving blk_cleanup_queue() into sd_remove() and > > sr_remove(). > > This looks like a bit of a bandaid to me. I think the proper fix > is to move the parts of blk_cleanup_queue that need a disk or bdi > to del_gendisk. >From correctness viewpoint, we need to call blk_cleanup_queue before releasing gendisk and after del_gendisk(). Now you have invented blk_cleanup_disk(), do you plan to do the three in one helper? :-) We don't have to put del_gendisk & blk_cleanup_queue together, and it may cause other trouble at least for scsi disk since sd_shutdown() follows del_gendisk() and has to be called before blk_cleanup_queue(). BTW, you asked the reproducer of the issue, I just observed the issue one or two time when running blktests block/009, but my scsi lifetime bpftrace script does show that gendisk is released before blk_cleanup_queue(). Thanks, Ming