On 2020-04-14 23:16, Luis Chamberlain wrote: > On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote: >> Hmm, where exactly does the race come in so that it can only happen >> after where you take the reference, but not before it? I'm probably >> missing something, but that just means it needs to be explained a little >> better :) > >>From the trace on patch 2/5: > > BLKTRACE_SETUP(loop0) #2 > [ 13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start > [ 13.936758] === do_blk_trace_setup(2) start > [ 13.938944] === do_blk_trace_setup(2) creating directory > [ 13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave > > ---> From LOOP_CTL_DEL(loop0) #2 > [ 13.971046] === blk_trace_cleanup(7) end > [ 13.973175] == __blk_trace_remove(7) end > [ 13.975352] == blk_trace_shutdown(7) end > [ 13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister() > [ 13.980645] ==== blk_mq_debugfs_unregister(7) begin > [ 13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir) > [ 13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL > [ 13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end > [ 13.993155] = __blk_release_queue(7) end > > ---> From BLKTRACE_SETUP(loop0) #2 > [ 13.995928] === do_blk_trace_setup(2) end with ret: 0 > [ 13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end > > The BLKTRACESETUP above works on request_queue which later > LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us. > If you use this commit alone though, this doesn't fix the race issue > however, and that's because of both still the debugfs_lookup() use > and that we're still using asynchronous removal at this point. > > refcounting will just ensure we don't take the request_queue underneath > our noses. I think the above trace reveals a bug in the loop driver. The loop driver shouldn't allow the associated request queue to disappear while the loop device is open. One may want to have a look at sd_open() in the sd driver. The scsi_disk_get() call in that function not only increases the reference count of the SCSI disk but also of the underlying SCSI device. Thanks, Bart.