On Wed, Apr 15, 2020 at 12:14:25AM -0700, Christoph Hellwig wrote: > On Wed, Apr 15, 2020 at 06:16:49AM +0000, Luis Chamberlain wrote: > > The BLKTRACESETUP above works on request_queue which later > > LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us. > > If you use this commit alone though, this doesn't fix the race issue > > however, and that's because of both still the debugfs_lookup() use > > and that we're still using asynchronous removal at this point. > > > > refcounting will just ensure we don't take the request_queue underneath > > our noses. > > > > Should I just add this to the commit log? > > That sounds much more useful than the trace. > > Btw, Isn't blk_get_queue racy as well? Shouldn't we check > blk_queue_dying after getting the reference and undo it if the queue is > indeeed dying? Yes that race should be possible: bool blk_get_queue(struct request_queue *q) { if (likely(!blk_queue_dying(q))) { ----------> we can get the queue to go dying here <--------- __blk_get_queue(q); return true; } return false; } EXPORT_SYMBOL(blk_get_queue); I'll pile up a fix. I've also considered doing a full review of callers outside of the core block layer using it, and maybe just unexporting this. It was originally exported due to commit d86e0e83b ("block: export blk_{get,put}_queue()") to fix a scsi bug, but I can't find such respective fix. I suspec that using bdgrab()/bdput() seems more likely what drivers should be using. That would allow us to keep this functionality internal. Think that's worthy review? Luis