Re: [PATCH v7 3/5] blk-mq: Fix races between iterating over requests and freeing requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/22/21 8:52 PM, Ming Lei wrote:
> For example, scsi aacraid normal completion vs. reset together with elevator
> switch, aacraid is one single queue HBA, and the request will be completed
> via IPI or softirq asynchronously, that said request isn't really completed
> after blk_mq_complete_request() returns.
> 
> 1) interrupt comes, and request A is completed via blk_mq_complete_request()
> from aacraid's interrupt handler via ->scsi_done()
> 
> 2) _aac_reset_adapter() comes because of reset event which can be
> triggered by sysfs store or whatever, irq is drained in 
> _aac_reset_adpter(), so blk_mq_complete_request(request A) from aacraid irq
> context is done, but request A is just scheduled to be completed via IPI
> or softirq asynchronously, not really done yet.
> 
> 3) scsi_host_complete_all_commands() is called from _aac_reset_adapter() for
> failing all pending requests. request A is still visible in
> scsi_host_complete_all_commands, because its tag isn't freed yet. But the
> tag & request A can be completed & freed exactly after scsi_host_complete_all_commands()
> reads ->rqs[bitnr] in bt_tags_iter(), which calls complete_all_cmds_iter()
> -> .scsi_done() -> blk_mq_complete_request(), and same request A is scheduled via
> IPI or softirq, and request A is addded in ipi or softirq list.
> 
> 4) meantime request A is freed from normal completion triggered by interrupt, one
> pending elevator switch can move on since request A drops the last reference; and
> bt_tags_iter() returns from reset path, so blk_mq_wait_for_tag_iter() can return
> too, then the whole scheduler request pool is freed now.
> 
> 5) request A in ipi/softirq list scheduled from _aac_reset_adapter is read , UAF
> is triggered.
> 
> It is supposed that driver covers normal completion vs. error handling, but wrt.
> remove completion, not sure driver is capable of covering that.

Hi Ming,

I agree that the scenario above can happen and also that a fix is
needed. However, I do not agree that this should be fixed by modifying
the tag iteration functions. I see scsi_host_complete_all_commands() as
a workaround for broken storage controllers - storage controllers that
do not have a facility for terminating all pending commands. NVMe
controllers can be told to terminate all pending I/O commands by e.g.
deleting all I/O submission queues. Many SCSI controllers also provide a
facility for aborting all pending commands. For SCSI controllers that do
not provide such a facility I propose to fix races like the race
described above in the SCSI LLD instead of modifying the block layer tag
iteration functions.

Thanks,

Bart.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux