Re: [PATCH v7 3/5] blk-mq: Fix races between iterating over requests and freeing requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 23, 2021 at 10:52:43AM -0700, Bart Van Assche wrote:
> On 4/22/21 8:52 PM, Ming Lei wrote:
> > For example, scsi aacraid normal completion vs. reset together with elevator
> > switch, aacraid is one single queue HBA, and the request will be completed
> > via IPI or softirq asynchronously, that said request isn't really completed
> > after blk_mq_complete_request() returns.
> > 
> > 1) interrupt comes, and request A is completed via blk_mq_complete_request()
> > from aacraid's interrupt handler via ->scsi_done()
> > 
> > 2) _aac_reset_adapter() comes because of reset event which can be
> > triggered by sysfs store or whatever, irq is drained in 
> > _aac_reset_adpter(), so blk_mq_complete_request(request A) from aacraid irq
> > context is done, but request A is just scheduled to be completed via IPI
> > or softirq asynchronously, not really done yet.
> > 
> > 3) scsi_host_complete_all_commands() is called from _aac_reset_adapter() for
> > failing all pending requests. request A is still visible in
> > scsi_host_complete_all_commands, because its tag isn't freed yet. But the
> > tag & request A can be completed & freed exactly after scsi_host_complete_all_commands()
> > reads ->rqs[bitnr] in bt_tags_iter(), which calls complete_all_cmds_iter()
> > -> .scsi_done() -> blk_mq_complete_request(), and same request A is scheduled via
> > IPI or softirq, and request A is addded in ipi or softirq list.
> > 
> > 4) meantime request A is freed from normal completion triggered by interrupt, one
> > pending elevator switch can move on since request A drops the last reference; and
> > bt_tags_iter() returns from reset path, so blk_mq_wait_for_tag_iter() can return
> > too, then the whole scheduler request pool is freed now.
> > 
> > 5) request A in ipi/softirq list scheduled from _aac_reset_adapter is read , UAF
> > is triggered.
> > 
> > It is supposed that driver covers normal completion vs. error handling, but wrt.
> > remove completion, not sure driver is capable of covering that.
> 
> Hi Ming,
> 
> I agree that the scenario above can happen and also that a fix is
> needed. However, I do not agree that this should be fixed by modifying
> the tag iteration functions. I see scsi_host_complete_all_commands() as
> a workaround for broken storage controllers - storage controllers that
> do not have a facility for terminating all pending commands. NVMe
> controllers can be told to terminate all pending I/O commands by e.g.
> deleting all I/O submission queues. Many SCSI controllers also provide a
> facility for aborting all pending commands. For SCSI controllers that do
> not provide such a facility I propose to fix races like the race
> described above in the SCSI LLD instead of modifying the block layer tag
> iteration functions.

Hi Bart,

Terminating all pending commands can't avoid the issue wrt. request UAF,
so far blk_mq_tagset_wait_completed_request() is used for making sure
that all pending requests are really aborted.

However, blk_mq_wait_for_tag_iter() still may return before
blk_mq_wait_for_tag_iter() is done because blk_mq_wait_for_tag_iter()
supposes all request reference is just done inside bt_tags_iter(),
especially .iter_rwsem and read rcu lock is added in bt_tags_iter().

What I really meant is that .iter_rwsem/read rcu added or blk_mq_wait_for_tag_iter
can't do what is expected.


Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux