On 4/22/21 8:52 PM, Ming Lei wrote: > For example, scsi aacraid normal completion vs. reset together with elevator > switch, aacraid is one single queue HBA, and the request will be completed > via IPI or softirq asynchronously, that said request isn't really completed > after blk_mq_complete_request() returns. > > 1) interrupt comes, and request A is completed via blk_mq_complete_request() > from aacraid's interrupt handler via ->scsi_done() > > 2) _aac_reset_adapter() comes because of reset event which can be > triggered by sysfs store or whatever, irq is drained in > _aac_reset_adpter(), so blk_mq_complete_request(request A) from aacraid irq > context is done, but request A is just scheduled to be completed via IPI > or softirq asynchronously, not really done yet. > > 3) scsi_host_complete_all_commands() is called from _aac_reset_adapter() for > failing all pending requests. request A is still visible in > scsi_host_complete_all_commands, because its tag isn't freed yet. But the > tag & request A can be completed & freed exactly after scsi_host_complete_all_commands() > reads ->rqs[bitnr] in bt_tags_iter(), which calls complete_all_cmds_iter() > -> .scsi_done() -> blk_mq_complete_request(), and same request A is scheduled via > IPI or softirq, and request A is addded in ipi or softirq list. > > 4) meantime request A is freed from normal completion triggered by interrupt, one > pending elevator switch can move on since request A drops the last reference; and > bt_tags_iter() returns from reset path, so blk_mq_wait_for_tag_iter() can return > too, then the whole scheduler request pool is freed now. > > 5) request A in ipi/softirq list scheduled from _aac_reset_adapter is read , UAF > is triggered. > > It is supposed that driver covers normal completion vs. error handling, but wrt. > remove completion, not sure driver is capable of covering that. Hi Ming, I agree that the scenario above can happen and also that a fix is needed. However, I do not agree that this should be fixed by modifying the tag iteration functions. I see scsi_host_complete_all_commands() as a workaround for broken storage controllers - storage controllers that do not have a facility for terminating all pending commands. NVMe controllers can be told to terminate all pending I/O commands by e.g. deleting all I/O submission queues. Many SCSI controllers also provide a facility for aborting all pending commands. For SCSI controllers that do not provide such a facility I propose to fix races like the race described above in the SCSI LLD instead of modifying the block layer tag iteration functions. Thanks, Bart.