[PATCH] tcm_qla2xxx: Wait for LUN_RESET aborted WRITEs to post via CTIO interrupt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>

This patch addresses a bug where LUN_RESET aborted WRITEs could be released
before a CTIO interrupt was able to post from hardware to acknowledge the
aborted descriptor on the fabric.

The fix adds a check in tcm_qla2xxx_write_pending_status for TRANSPORT_WRITE_PENDING
status to wait for completion timeout in processing thread context while LUN_RESET
is performed, and adds the inverse check within tcm_qla2xxx_handle_data() to
determine when to finish completion when qla_tgt_cmd->write_data_transferred == 0
and se_cmd->t_transport_aborted != 0 once the CTIO interrupt has been triggered.

This was first noticed with the following OOPs on .32 backports for the following
descriptor: cmd: ffff88016709c300

[  609.414494] LUN_RESET:  cmd: ffff88016709c300 task: ffff8801dea81b60 ITT/CmdSN: 0x00116e20/0x00000000, i_state: 0, t_state/def_t_state: 3/0 cdb: 0x2a
[  609.414497] LUN_RESET: ITT[0x00116e20] - pr_res_key: 0x0000000000000000 t_task_cdbs: 8 t_task_cdbs_left: 8 t_task_cdbs_sent: 0 -- t_transport_active: 0 t_transport_stop: 0 t_transport_sent: 0
[  609.414500] LUN_RESET: Got t_transport_active = 0 for task: ffff8801dea81b60, t_fe_count: 1 dev: ffff8801d33b66c0
[  609.670008] LUN_RESET:  from Device Queue: cmd: ffff8801df199c40 t_state: 9 t_fe_count: 0
[  609.670017] LUN_RESET:  from Device Queue: cmd: ffff8801df199440 t_state: 9 t_fe_count: 0
[  609.670021] LUN_RESET:  from Device Queue: cmd: ffff8801df199840 t_state: 9 t_fe_count: 0
[  609.670025] LUN_RESET:  from Device Queue: cmd: ffff8801df19a040 t_state: 9 t_fe_count: 0
[  609.670030] LUN_RESET: TMR for [iblock] Complete
[  609.670032] queue_tm_rsp: mcmd: ffff8801df199000 func: 0x05 response: 0x00
[  609.676015] ------------[ cut here ]------------
[  609.676017] kernel BUG at /usr/src/lio-core-backport.git/kernel/drivers/scsi/qla2xxx/qla_target.c:2818!
[  609.676019] invalid opcode: 0000 [#1] SMP
[  609.676021] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:03:00.1/host14/rport-14:0-0/bsg/rport-14:0-0/uevent
[  609.676023] CPU 1
[  609.676024] Modules linked in: ib_srpt tcm_qla2xxx tcm_loop iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod qla2xxx ib_cm ib_sa ib_mad ib_core configfs loop snd_pcm snd_timer snd soundcore snd_page_alloc ioatdma i2c_i801 i2c_core pcspkr joydev evdev processor button ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid ata_generic ata_piix libata uhci_hcd ehci_hcd scsi_transport_fc scsi_tgt usbcore igb thermal nls_base scsi_mod dca thermal_sys [last unloaded: qla2xxx]
[  609.676047] Pid: 3825, comm: LIO_iblock Not tainted 2.6.32-5-amd64 #1 S5520HC
[  609.676049] RIP: 0010:[<ffffffffa034698c>]  [<ffffffffa034698c>] qla_tgt_free_cmd+0xd/0x32 [qla2xxx]
[  609.676059] RSP: 0018:ffff880169cd9da0  EFLAGS: 00010202
[  609.676061] RAX: ffff88016c71ada0 RBX: ffff88016709c280 RCX: 000000000000c2a0
[  609.676063] RDX: ffff88016c71ada0 RSI: 0000000000000282 RDI: ffff88016709c280
[  609.676064] RBP: ffff88016709c300 R08: 0000000000000000 R09: 000000000000005a
[  609.676066] R10: 0000000000000002 R11: dead000000200200 R12: 0000000000000001
[  609.676068] R13: ffff88016709c580 R14: ffff88016709c300 R15: 0000000000000286
[  609.676070] FS:  0000000000000000(0000) GS:ffff880173c00000(0000) knlGS:0000000000000000
[  609.676072] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  609.676073] CR2: 0000000000db4308 CR3: 0000000203f1e000 CR4: 00000000000006e0
[  609.676075] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  609.676076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  609.676078] Process LIO_iblock (pid: 3825, threadinfo ffff880169cd8000, task ffff88016da41d40)
[  609.676080] Stack:
[  609.676080]  0000000000000286 ffffffffa00cecbf ffff88026d8c3e58 ffff88026d8c3e5c
[  609.676083] <0> ffff8801d33b66c0 ffffffffa00d0ca1 0000000000000001 ffff880173cb5780
[  609.676085] <0> ffff88026d911e00 ffff880169cd9e70 0000000000015780 ffff880169cd8000
[  609.676087] Call Trace:
[  609.676095]  [<ffffffffa00cecbf>] ? transport_generic_remove+0xeb/0x10c [target_core_mod]
[  609.676101]  [<ffffffffa00d0ca1>] ? transport_processing_thread+0x1104/0x13c8 [target_core_mod]
[  609.676107]  [<ffffffff812fae40>] ? thread_return+0x79/0xe0
[  609.676113]  [<ffffffff8103a453>] ? activate_task+0x22/0x28
[  609.676116]  [<ffffffff81064e96>] ? autoremove_wake_function+0x0/0x2e
[  609.676121]  [<ffffffffa00cfb9d>] ? transport_processing_thread+0x0/0x13c8 [target_core_mod]
[  609.676124]  [<ffffffff81064bc9>] ? kthread+0x79/0x81
[  609.676127]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[  609.676129]  [<ffffffff81064b50>] ? kthread+0x0/0x81
[  609.676131]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[  609.676132] Code: 41 f6 87 f3 00 00 00 08 74 98 e9 73 ff ff ff 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 53 f6 87 a0 03 00 00 02 48 89 fb 74 04 <0f> 0b eb fe 48 8b bf d0 03 00 00 48 85 ff 74 05 e8 49 08 da e0
[  609.676146] RIP  [<ffffffffa034698c>] qla_tgt_free_cmd+0xd/0x32 [qla2xxx]
[  609.676152]  RSP <ffff880169cd9da0>
[  609.676897] ---[ end trace 4bf3a7033fe8b551 ]---
[  609.984043] qla_target(0): CTIO with status 0x2 received, state 4, se_cmd ffff88016709c300, (LIP_RESET=e, ABORTED=2, TARGET_RESET=17, TIMEOUT=b, INVALID_RX_ID=8)
[  612.227869] qla2xxx 0000:03:00.0: LIP occurred (0).

Cc: Roland Dreier <roland@xxxxxxxxxxxxxxx>
Cc: Madhuranath Iyengar <mni@xxxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxxxxxxxx>
---
 drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c |   27 +++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c b/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c
index 2ba34b4..8bc053e 100644
--- a/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c
+++ b/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c
@@ -646,6 +646,19 @@ int tcm_qla2xxx_write_pending(struct se_cmd *se_cmd)
 
 int tcm_qla2xxx_write_pending_status(struct se_cmd *se_cmd)
 {
+	unsigned long flags;
+	/*
+	 * Check for WRITE_PENDING status to determine if we need to wait for
+	 * CTIO aborts to be posted via hardware in tcm_qla2xxx_handle_data().
+	 */
+	spin_lock_irqsave(&se_cmd->t_state_lock, flags);
+	if (se_cmd->t_state == TRANSPORT_WRITE_PENDING) {
+		spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
+		wait_for_completion_timeout(&se_cmd->t_transport_stop_comp, 3000);
+		return 0;
+	}
+	spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
+
 	return 0;
 }
 
@@ -779,11 +792,25 @@ int tcm_qla2xxx_new_cmd_map(struct se_cmd *se_cmd)
  */
 int tcm_qla2xxx_handle_data(struct qla_tgt_cmd *cmd)
 {
+	struct se_cmd *se_cmd = &cmd->se_cmd;
+	unsigned long flags;
 	/*
 	 * Ensure that the complete FCP WRITE payload has been received.
 	 * Otherwise return an exception via CHECK_CONDITION status.
 	 */
 	if (!cmd->write_data_transferred) {
+		/*
+		 * Check if se_cmd has already been aborted via LUN_RESET, and is
+		 * waiting upon completion in tcm_qla2xxx_write_pending_status()..
+		 */
+		spin_lock_irqsave(&se_cmd->t_state_lock, flags);
+		if (atomic_read(&se_cmd->t_transport_aborted)) {        
+			spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
+			complete(&se_cmd->t_transport_stop_comp);
+			return 0;
+		}
+		spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
+
 		cmd->locked_rsp = 0;		
 
 		return transport_send_check_condition_and_sense(&cmd->se_cmd,
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux