On Thu, Apr 22, 2021 at 08:51:06AM -0700, Bart Van Assche wrote: > On 4/22/21 12:13 AM, Ming Lei wrote: > > On Wed, Apr 21, 2021 at 08:54:30PM -0700, Bart Van Assche wrote: > >> On 4/21/21 8:15 PM, Ming Lei wrote: > >>> On Tue, Apr 20, 2021 at 05:02:33PM -0700, Bart Van Assche wrote: > >>>> +static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) > >>>> +{ > >>>> + struct bt_tags_iter_data *iter_data = data; > >>>> + struct blk_mq_tags *tags = iter_data->tags; > >>>> + bool res; > >>>> + > >>>> + if (iter_data->flags & BT_TAG_ITER_MAY_SLEEP) { > >>>> + down_read(&tags->iter_rwsem); > >>>> + res = __bt_tags_iter(bitmap, bitnr, data); > >>>> + up_read(&tags->iter_rwsem); > >>>> + } else { > >>>> + rcu_read_lock(); > >>>> + res = __bt_tags_iter(bitmap, bitnr, data); > >>>> + rcu_read_unlock(); > >>>> + } > >>>> + > >>>> + return res; > >>>> +} > >>> > >>> Holding one rwsem or rcu read lock won't avoid the issue completely > >>> because request may be completed remotely in iter_data->fn(), such as > >>> nbd_clear_req(), nvme_cancel_request(), complete_all_cmds_iter(), > >>> mtip_no_dev_cleanup(), because blk_mq_complete_request() may complete > >>> request in softirq, remote IPI, even wq, and the request is still > >>> referenced in these contexts after bt_tags_iter() returns. > >> > >> The rwsem and RCU read lock are used to serialize iterating over > >> requests against blk_mq_sched_free_requests() calls. I don't think it > >> matters for this patch from which context requests are freed. > > > > Requests still can be referred in other context after blk_mq_wait_for_tag_iter() > > returns, then follows freeing request pool. And use-after-free exists too, doesn't it? > > The request pool should only be freed after it has been guaranteed that > all pending requests have finished and also that no new requests will be > started. This patch series adds two blk_mq_wait_for_tag_iter() calls. > Both calls happen while the queue is frozen so I don't think that the > issue mentioned in your email can happen. For example, scsi aacraid normal completion vs. reset together with elevator switch, aacraid is one single queue HBA, and the request will be completed via IPI or softirq asynchronously, that said request isn't really completed after blk_mq_complete_request() returns. 1) interrupt comes, and request A is completed via blk_mq_complete_request() from aacraid's interrupt handler via ->scsi_done() 2) _aac_reset_adapter() comes because of reset event which can be triggered by sysfs store or whatever, irq is drained in _aac_reset_adpter(), so blk_mq_complete_request(request A) from aacraid irq context is done, but request A is just scheduled to be completed via IPI or softirq asynchronously, not really done yet. 3) scsi_host_complete_all_commands() is called from _aac_reset_adapter() for failing all pending requests. request A is still visible in scsi_host_complete_all_commands, because its tag isn't freed yet. But the tag & request A can be completed & freed exactly after scsi_host_complete_all_commands() reads ->rqs[bitnr] in bt_tags_iter(), which calls complete_all_cmds_iter() -> .scsi_done() -> blk_mq_complete_request(), and same request A is scheduled via IPI or softirq, and request A is addded in ipi or softirq list. 4) meantime request A is freed from normal completion triggered by interrupt, one pending elevator switch can move on since request A drops the last reference; and bt_tags_iter() returns from reset path, so blk_mq_wait_for_tag_iter() can return too, then the whole scheduler request pool is freed now. 5) request A in ipi/softirq list scheduled from _aac_reset_adapter is read , UAF is triggered. It is supposed that driver covers normal completion vs. error handling, but wrt. remove completion, not sure driver is capable of covering that. Thanks, Ming