Hi Bart,
If I understand the race correctly, its not between the requests
completion and the queue pairs removal nor the timeout handler
necessarily, but rather it is between the async requests completion and
the tagset deallocation.
Think of surprise removal (or disconnect) during I/O, drivers
usually stop/quiesce/freeze the queues, terminate/abort inflight
I/Os and then teardown the hw queues and the tagset.
IIRC, the same race holds for srp if this happens during I/O:
1. srp_rport_delete() -> srp_remove_target() -> srp_stop_rport_timers() ->
__rport_fail_io_fast()
2. complete all I/Os (async remotely via smp)
Then continue..
3. scsi_host_put() -> scsi_host_dev_release() -> scsi_mq_destroy_tags()
What is preventing (3) from happening before (2) if its async? I would
think that scsi drivers need the exact same thing...
blk_cleanup_queue() will do that, but it can't be used in device recovery
obviously.
But in device recovery we never free the tagset... I might be missing
the race here then...
BTW, blk_mq_complete_request_sync() is a bit misleading, maybe
blk_mq_complete_request_locally() is better.
Not really...