From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> This patch addresses a bug where LUN_RESET aborted WRITEs could be released before a CTIO interrupt was able to post from hardware to acknowledge the aborted descriptor on the fabric. The fix adds a check in tcm_qla2xxx_write_pending_status for TRANSPORT_WRITE_PENDING status to wait for completion timeout in processing thread context while LUN_RESET is performed, and adds the inverse check within tcm_qla2xxx_handle_data() to determine when to finish completion when qla_tgt_cmd->write_data_transferred == 0 and se_cmd->t_transport_aborted != 0 once the CTIO interrupt has been triggered. This was first noticed with the following OOPs on .32 backports for the following descriptor: cmd: ffff88016709c300 [ 609.414494] LUN_RESET: cmd: ffff88016709c300 task: ffff8801dea81b60 ITT/CmdSN: 0x00116e20/0x00000000, i_state: 0, t_state/def_t_state: 3/0 cdb: 0x2a [ 609.414497] LUN_RESET: ITT[0x00116e20] - pr_res_key: 0x0000000000000000 t_task_cdbs: 8 t_task_cdbs_left: 8 t_task_cdbs_sent: 0 -- t_transport_active: 0 t_transport_stop: 0 t_transport_sent: 0 [ 609.414500] LUN_RESET: Got t_transport_active = 0 for task: ffff8801dea81b60, t_fe_count: 1 dev: ffff8801d33b66c0 [ 609.670008] LUN_RESET: from Device Queue: cmd: ffff8801df199c40 t_state: 9 t_fe_count: 0 [ 609.670017] LUN_RESET: from Device Queue: cmd: ffff8801df199440 t_state: 9 t_fe_count: 0 [ 609.670021] LUN_RESET: from Device Queue: cmd: ffff8801df199840 t_state: 9 t_fe_count: 0 [ 609.670025] LUN_RESET: from Device Queue: cmd: ffff8801df19a040 t_state: 9 t_fe_count: 0 [ 609.670030] LUN_RESET: TMR for [iblock] Complete [ 609.670032] queue_tm_rsp: mcmd: ffff8801df199000 func: 0x05 response: 0x00 [ 609.676015] ------------[ cut here ]------------ [ 609.676017] kernel BUG at /usr/src/lio-core-backport.git/kernel/drivers/scsi/qla2xxx/qla_target.c:2818! [ 609.676019] invalid opcode: 0000 [#1] SMP [ 609.676021] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:03:00.1/host14/rport-14:0-0/bsg/rport-14:0-0/uevent [ 609.676023] CPU 1 [ 609.676024] Modules linked in: ib_srpt tcm_qla2xxx tcm_loop iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod qla2xxx ib_cm ib_sa ib_mad ib_core configfs loop snd_pcm snd_timer snd soundcore snd_page_alloc ioatdma i2c_i801 i2c_core pcspkr joydev evdev processor button ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid ata_generic ata_piix libata uhci_hcd ehci_hcd scsi_transport_fc scsi_tgt usbcore igb thermal nls_base scsi_mod dca thermal_sys [last unloaded: qla2xxx] [ 609.676047] Pid: 3825, comm: LIO_iblock Not tainted 2.6.32-5-amd64 #1 S5520HC [ 609.676049] RIP: 0010:[<ffffffffa034698c>] [<ffffffffa034698c>] qla_tgt_free_cmd+0xd/0x32 [qla2xxx] [ 609.676059] RSP: 0018:ffff880169cd9da0 EFLAGS: 00010202 [ 609.676061] RAX: ffff88016c71ada0 RBX: ffff88016709c280 RCX: 000000000000c2a0 [ 609.676063] RDX: ffff88016c71ada0 RSI: 0000000000000282 RDI: ffff88016709c280 [ 609.676064] RBP: ffff88016709c300 R08: 0000000000000000 R09: 000000000000005a [ 609.676066] R10: 0000000000000002 R11: dead000000200200 R12: 0000000000000001 [ 609.676068] R13: ffff88016709c580 R14: ffff88016709c300 R15: 0000000000000286 [ 609.676070] FS: 0000000000000000(0000) GS:ffff880173c00000(0000) knlGS:0000000000000000 [ 609.676072] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 609.676073] CR2: 0000000000db4308 CR3: 0000000203f1e000 CR4: 00000000000006e0 [ 609.676075] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 609.676076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 609.676078] Process LIO_iblock (pid: 3825, threadinfo ffff880169cd8000, task ffff88016da41d40) [ 609.676080] Stack: [ 609.676080] 0000000000000286 ffffffffa00cecbf ffff88026d8c3e58 ffff88026d8c3e5c [ 609.676083] <0> ffff8801d33b66c0 ffffffffa00d0ca1 0000000000000001 ffff880173cb5780 [ 609.676085] <0> ffff88026d911e00 ffff880169cd9e70 0000000000015780 ffff880169cd8000 [ 609.676087] Call Trace: [ 609.676095] [<ffffffffa00cecbf>] ? transport_generic_remove+0xeb/0x10c [target_core_mod] [ 609.676101] [<ffffffffa00d0ca1>] ? transport_processing_thread+0x1104/0x13c8 [target_core_mod] [ 609.676107] [<ffffffff812fae40>] ? thread_return+0x79/0xe0 [ 609.676113] [<ffffffff8103a453>] ? activate_task+0x22/0x28 [ 609.676116] [<ffffffff81064e96>] ? autoremove_wake_function+0x0/0x2e [ 609.676121] [<ffffffffa00cfb9d>] ? transport_processing_thread+0x0/0x13c8 [target_core_mod] [ 609.676124] [<ffffffff81064bc9>] ? kthread+0x79/0x81 [ 609.676127] [<ffffffff81011baa>] ? child_rip+0xa/0x20 [ 609.676129] [<ffffffff81064b50>] ? kthread+0x0/0x81 [ 609.676131] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 [ 609.676132] Code: 41 f6 87 f3 00 00 00 08 74 98 e9 73 ff ff ff 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 53 f6 87 a0 03 00 00 02 48 89 fb 74 04 <0f> 0b eb fe 48 8b bf d0 03 00 00 48 85 ff 74 05 e8 49 08 da e0 [ 609.676146] RIP [<ffffffffa034698c>] qla_tgt_free_cmd+0xd/0x32 [qla2xxx] [ 609.676152] RSP <ffff880169cd9da0> [ 609.676897] ---[ end trace 4bf3a7033fe8b551 ]--- [ 609.984043] qla_target(0): CTIO with status 0x2 received, state 4, se_cmd ffff88016709c300, (LIP_RESET=e, ABORTED=2, TARGET_RESET=17, TIMEOUT=b, INVALID_RX_ID=8) [ 612.227869] qla2xxx 0000:03:00.0: LIP occurred (0). Cc: Roland Dreier <roland@xxxxxxxxxxxxxxx> Cc: Madhuranath Iyengar <mni@xxxxxxxxxxxxxxxxxxxxx> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxxxxxxxx> --- drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c | 27 +++++++++++++++++++++++ 1 files changed, 27 insertions(+), 0 deletions(-) diff --git a/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c b/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c index 2ba34b4..8bc053e 100644 --- a/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c +++ b/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c @@ -646,6 +646,19 @@ int tcm_qla2xxx_write_pending(struct se_cmd *se_cmd) int tcm_qla2xxx_write_pending_status(struct se_cmd *se_cmd) { + unsigned long flags; + /* + * Check for WRITE_PENDING status to determine if we need to wait for + * CTIO aborts to be posted via hardware in tcm_qla2xxx_handle_data(). + */ + spin_lock_irqsave(&se_cmd->t_state_lock, flags); + if (se_cmd->t_state == TRANSPORT_WRITE_PENDING) { + spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); + wait_for_completion_timeout(&se_cmd->t_transport_stop_comp, 3000); + return 0; + } + spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); + return 0; } @@ -779,11 +792,25 @@ int tcm_qla2xxx_new_cmd_map(struct se_cmd *se_cmd) */ int tcm_qla2xxx_handle_data(struct qla_tgt_cmd *cmd) { + struct se_cmd *se_cmd = &cmd->se_cmd; + unsigned long flags; /* * Ensure that the complete FCP WRITE payload has been received. * Otherwise return an exception via CHECK_CONDITION status. */ if (!cmd->write_data_transferred) { + /* + * Check if se_cmd has already been aborted via LUN_RESET, and is + * waiting upon completion in tcm_qla2xxx_write_pending_status().. + */ + spin_lock_irqsave(&se_cmd->t_state_lock, flags); + if (atomic_read(&se_cmd->t_transport_aborted)) { + spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); + complete(&se_cmd->t_transport_stop_comp); + return 0; + } + spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); + cmd->locked_rsp = 0; return transport_send_check_condition_and_sense(&cmd->se_cmd, -- 1.7.2.5 -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html