Would anyone please take a look at this ? Thanks in advance Jianchao On 05/23/2018 11:55 AM, jianchao.wang wrote: > > > Hi all > > Our customer met a panic triggered by BUG_ON in blk_finish_request. >>From the dmesg log, the BUG_ON was triggered after command abort occurred many times. > There is a race condition in the following scenario. > > cpu A cpu B > kworker interrupt > > scmd_eh_abort_handler() > -> scsi_try_to_abort_cmd() > -> qla2xxx_eh_abort() > -> qla2x00_eh_wait_on_command() qla2x00_status_entry() > -> qla2x00_sp_compl() > -> qla2x00_sp_free_dma() > -> scsi_queue_insert() > -> __scsi_queue_insert() > -> blk_requeue_request() > -> blk_clear_rq_complete() > -> scsi_done > -> blk_complete_request > -> blk_mark_rq_complete > -> elv_requeue_request() -> __blk_complete_request() > -> __elv_add_request() > // req will be queued here > BLK_SOFTIRQ > scsi_softirq_done() > -> scsi_finish_command() > -> scsi_io_completion() > -> scsi_end_request() > -> blk_finish_request() // BUG_ON(blk_queued_rq(req)) !!! > > The issue will not be triggered most of time, because the request is marked as complete by timeout path. > So the scsi_done from qla2x00_sp_compl does nothing. > But as the scenario above, if the complete state has been cleaned by blk_requeue_request, we will get > the request both requeued and completed, and then BUG_ON(blk_queued_rq(req)) in blk_finish_request comes up. > > Is there any solution for this in qla2xxx driver side ? > > Thanks > Jianchao > >