On 6/26/2014 2:06 PM, Arshad Hussain wrote: > Hi Nab, > On 6/25/2014 6:21 PM, Arshad Hussain wrote: >> On 6/20/2014 11:17 PM, Nicholas A. Bellinger wrote: >>> On Fri, 2014-06-20 at 15:54 +0530, Arshad Hussain wrote: >>>> Hi Nab, >>>> On 6/20/2014 12:32 PM, Nicholas A. Bellinger wrote: >>>>> On Thu, 2014-06-19 at 23:43 -0700, Nicholas A. Bellinger wrote: >>>>>> On Fri, 2014-06-20 at 11:30 +0530, Santosh Kulkarni wrote: >>> <SNIP> >>> >>>>> One other quick thing to try is the following patch to dump the Data-out >>>>> payload when a ITT received is not associated with a WRITE (following >>>>> the log), instead of generating a Reject for the existing ITT. >>>>> >>>>> I can't quite tell from wireshark output why your test is sending >>>>> Data-out with an ITT that is associated with a READ that has not been >>>>> acknowledged by StatSN, but the following should at least leave the READ >>>>> ITT alone. >>>>> >>>>> --nab >>>>> >>>>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c >>>>> index b87721a..049fe14 100644 >>>>> --- a/drivers/target/iscsi/iscsi_target.c >>>>> +++ b/drivers/target/iscsi/iscsi_target.c >>>>> @@ -1308,7 +1308,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, unsigned char *buf, >>>>> if (cmd->data_direction != DMA_TO_DEVICE) { >>>>> pr_err("Command ITT: 0x%08x received DataOUT for a" >>>>> " NON-WRITE command.\n", cmd->init_task_tag); >>>>> - return iscsit_reject_cmd(cmd, ISCSI_REASON_PROTOCOL_ERROR, buf); >>>>> + return iscsit_dump_data_payload(conn, payload_length, 1); >>>>> } >>>>> se_cmd = &cmd->se_cmd; >>>>> iscsit_mod_dataout_timer(cmd); >>>>> >>>>> >>>> Santosh is traveling and I will help out with any answers you need. >>>> >>>> The ITT is unique within a session, should READ here be a problem.? And >>>> what we have observed that our the crash is intermittent as we have >>>> mention before. >>>> >>> That is not what is indicated in the logs so far. The output (appears) >>> to show that an ITT that was previously used by a READ is being reused >>> by a WRITE + associated Data-Out, before the original READ is explicitly >>> acknowledged via StatSN. >>> >>> This results in the lookup for the Data-out ITT to locate the original >>> READ, and correctly fail due to the incorrect data_direction type of the >>> original command. >>> >>>> Tried out your patch on 3.14.0-rc6+ and 3.15.0-rc3+. With 3.15.0-rc3 >>>> the target is not crashing. The dmesg shows the line below. >>>> >>>> [root@wfsc iscsi]# dmesg -c >>>> [ 210.623207] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE >>>> command. >>>> >>> Thanks for confirming. I think this is the correct fix, but would like >>> to verify some more on my end. Please send me the test off-list. > This is what we are trying to achieve. Read from socket1 and socket2 and > then write to socket1 and socket2. I have detailed the flow below. The > code is modified to read and write with different ITT. And this work > just fine. It is only when we read & write on two different sockets with > same ITT it fails. > > Please notice after READ we are receiving stat_sn and incrementing the > counters. This indicates that the stat_sn is being acknowledge. IMHO, > the reads and writes on multiple sockets with same ITT should work. > While we can change the test case to make it pass, by reading and > writing on different ITT. This however will shadow a potential bug in > target. As always your valuable comments. > > READ10 CMD [socket1,ITT=A3A3A3A3] -> [T] > /* Update the counters. At this point stat_sn is ack. */ > next_cmdsn = recvpdu->exp_cmd_sn; > next_expstatsn = (recvpdu->stat_sn) + 0x01; > maxcmdsn = recvpdu->max_cmd_sn; > > READ10 CMD [socket2,ITT=A3A3A3A3] -> [T] > /* Update counters. At this point stat_sn is ack.*/ > next_cmdsn = recvpdu->exp_cmd_sn; > next_expstatsn = (recvpdu->stat_sn) + 0x01; > maxcmdsn = recvpdu->max_cmd_sn; > > WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T] > R2T <- [T] > DATA OUT [socket1,ITT=A5A5A5A5] -> [T] > > WRITE10 CMD [socket2,ITT=A5A5A5A5] -> [T] > R2T <- [T] > DATA OUT [socket1,ITT=A5A5A5A5] -> [T] > > Thanks! > Arshad Just tried another set of cases with multiple connections. // First Write to socket1 WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T] R2T <- [T] DATA OUT [socket1,ITT=A5A5A5A5] -> [T] // Second write to socked2, with different ITT. This works. WRITE10 CMD [socket2,ITT=A3A3A3A3] -> [T] R2T <- [T] DATA OUT [socket1,ITT=A3A3A3A3] -> [T] // Third write again to socket1. ITT reuse. Please Note the writes are ack at this point. WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T] R2T <- [T] Observation: First and second write works. The third write fails (No crash). Printing below the dmesg. [16066.334410] Command ITT: 0xa5a5a5a5 received DataOUT after last DataOUT received, dumping payload. Thanks Arshad PS: Can you help me where to put BUG()/panic() as I am not being able to provide you with bt. I cannot also get vmcore on crash (previous 3.14 without patch) since it is locking I am unable to enter into dump-capture kernel. Overall, let me know what all more I can get you to help you debug. >>>> [root@wfsc iscsi]# uname -a >>>> Linux wfsc 3.15.0-rc3+ #3 SMP Fri May 30 06:11:42 EDT 2014 x86_64 x86_64 >>>> x86_64 GNU/Linux >>>> >>>> With 3.14.0-rc6+. It is crashing with the below output. >>>> >>> This output for v3.14-rc6 would indicate that the patch is not applied. >>> Eg: >>> >>> "ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set." >>> >>> means that iscsi_dump_data_payload() is not being called, because the >>> subsequent Data-Out payload (still in the TCP stream) is processed, and >>> fails at the WRITE + FINAL bits sanity checks. >>> >>> Please confirm this patch is applied with a printk() ahead of the call >>> to iscsi_dump_data_payload(). >> Apologies for the delay. >> >> I double check your patch on 3.14.0-rc6+. It is applied and I have >> added debug statement as you have advised to see if we are hitting the >> correct area. I have also attached a .png, as I was getting a soft lock. >> This is on same kernel (3.14.0-rc6+), but without your patch. I will >> also revisit our test plan and get back. >> >> <Dmesg output With your patch on 3.14.0-rc6+> >> root@wfsc target-pending]# dmesg >> [ 8915.579866] Debug#1 >> [ 8915.607790] Debug#1 >> [ 8915.607862] Debug#4 >> [ 8915.607923] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE >> command. >> [root@wfsc target-pending]# >> >> diff --git a/drivers/target/iscsi/iscsi_target.c >> b/drivers/target/iscsi/iscsi_target.c >> index 7e5469a..f941bac 100644 >> --- a/drivers/target/iscsi/iscsi_target.c >> +++ b/drivers/target/iscsi/iscsi_target.c >> @@ -1252,6 +1252,8 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, >> unsigned char *buf, >> struct se_cmd *se_cmd; >> u32 payload_length = ntoh24(hdr->dlength); >> int rc; >> + >> + pr_err("Debug#1\n"); >> >> if (!payload_length) { >> pr_warn("DataOUT payload is ZERO, ignoring.\n"); >> @@ -1262,6 +1264,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, >> unsigned char *buf, >> atomic_long_add(payload_length, &conn->sess->rx_data_octets); >> >> if (payload_length > conn->conn_ops->MaxXmitDataSegmentLength) { >> + pr_err("Debug#2\n"); >> pr_err("DataSegmentLength: %u is greater than" >> " MaxXmitDataSegmentLength: %u\n", payload_length, >> conn->conn_ops->MaxXmitDataSegmentLength); >> @@ -1280,6 +1283,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, >> unsigned char *buf, >> payload_length, conn->cid); >> >> if (cmd->cmd_flags & ICF_GOT_LAST_DATAOUT) { >> + pr_err("Debug#3\n"); >> pr_err("Command ITT: 0x%08x received DataOUT after" >> " last DataOUT received, dumping payload\n", >> cmd->init_task_tag); >> @@ -1287,9 +1291,11 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, >> unsigned char *buf, >> } >> >> if (cmd->data_direction != DMA_TO_DEVICE) { >> + pr_err("Debug#4\n"); >> pr_err("Command ITT: 0x%08x received DataOUT for a" >> " NON-WRITE command.\n", cmd->init_task_tag); >> - return iscsit_reject_cmd(cmd, >> ISCSI_REASON_PROTOCOL_ERROR, buf); >> + return iscsit_dump_data_payload(conn, payload_length, 1); >> } >> se_cmd = &cmd->se_cmd; >> iscsit_mod_dataout_timer(cmd); >> @@ -1564,7 +1570,10 @@ int iscsit_process_nop_out(struct iscsi_conn >> *conn, struct iscsi_cmd *cmd, >> * Initiator is expecting a NopIN ping reply.. >> */ >> if (hdr->itt != RESERVED_ITT) { >> - BUG_ON(!cmd); >> + if (!cmd) >> + return iscsit_add_reject(conn, >> ISCSI_REASON_PROTOCOL_ERROR, >> + >> (unsigned char *)hdr); >> >> spin_lock_bh(&conn->cmd_lock); >> list_add_tail(&cmd->i_conn_node, &conn->conn_cmd_list); >> [root@wfsc target-pending]# >> >> Thanks, >> Arshad >>> Thanks, >>> >>> --nab >>> >>>> [ 124.571009] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE >>>> command. >>>> [ 124.623127] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 124.632252] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 124.676209] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 124.736480] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 124.924880] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 124.940144] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 125.040862] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 125.050377] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 125.066584] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 125.078853] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 125.081819] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad >>>> iSCSI Initiator. >>>> [ 127.576143] general protection fault: 0000 [#1] SMP >>>> [ 127.580095] Modules linked in: nls_utf8 isofs vboxsf(O) tcm_loop >>>> tcm_fc libfc scsi_transport_fc scsi_tgt iscsi_target_mod >>>> target_core_pscsi target_core_file target_core_iblock target_core_mod >>>> configfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr >>>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bnep rfcomm >>>> bluetooth snd_intel8x0 snd_ac97_codec 6lowpan_iphc ac97_bus snd_pcm >>>> snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer >>>> snd_seq_device snd joydev vboxguest(O) psmouse mac_hid i2c_piix4 ppdev >>>> serio_raw soundcore parport_pc lp parport hid_generic usbhid hid e1000 >>>> [ 127.580095] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O >>>> 3.14.0-rc6+ #4 >>>> [ 127.580095] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS >>>> VirtualBox 12/01/2006 >>>> [ 127.580095] task: ffffffff81c104a0 ti: ffffffff81c00000 task.ti: >>>> ffffffff81c00000 >>>> [ 127.580095] RIP: 0010:[<ffffffffa02f7c10>] [<ffffffffa02f7c10>] >>>> iscsit_tpg_get_node_attrib+0x10/0x20 [iscsi_target_mod] >>>> [ 127.580095] RSP: 0018:ffff88003fc03dd8 EFLAGS: 00010246 >>>> [ 127.580095] RAX: 617941580000000d RBX: ffff880037baa640 RCX: >>>> 0000000000000000 >>>> [ 127.580095] RDX: 0000000000000002 RSI: 0000000000000200 RDI: >>>> ffff88001a1d2800 >>>> [ 127.580095] RBP: ffff88003fc03dd8 R08: 0000000000016ab1 R09: >>>> 0000000081e5b240 >>>> [ 127.580095] R10: 0000000000000020 R11: 0000000000000000 R12: >>>> ffff880037b65800 >>>> [ 127.580095] R13: ffff880037baa724 R14: ffff88001a1d2800 R15: >>>> ffffffffa02f17d0 >>>> [ 127.580095] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) >>>> knlGS:0000000000000000 >>>> [ 127.580095] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>> [ 127.580095] CR2: 00007fe77c00b208 CR3: 000000002fdad000 CR4: >>>> 00000000000006f0 >>>> [ 127.580095] Stack: >>>> [ 127.580095] ffff88003fc03e28 ffffffffa02f183b 0000000000000000 >>>> 0000000000000000 >>>> [ 127.580095] ffff88003fc0d1a0 0000000000000000 0000000000000100 >>>> ffffffffa02f17d0 >>>> [ 127.580095] ffff880037baa758 ffffffff81e5c268 ffff88003fc03e78 >>>> ffffffff8105cc15 >>>> [ 127.580095] Call Trace: >>>> [ 127.580095] <IRQ> >>>> [ 127.580095] [<ffffffffa02f183b>] >>>> iscsit_handle_dataout_timeout+0x6b/0x250 [iscsi_target_mod] >>>> [ 127.580095] [<ffffffffa02f17d0>] ? >>>> iscsit_start_dataout_timer+0xa0/0xa0 [iscsi_target_mod] >>>> [ 127.580095] [<ffffffff8105cc15>] call_timer_fn+0x45/0x160 >>>> [ 127.580095] [<ffffffffa02f17d0>] ? >>>> iscsit_start_dataout_timer+0xa0/0xa0 [iscsi_target_mod] >>>> [ 127.580095] [<ffffffff8105dfa4>] run_timer_softirq+0x244/0x290 >>>> [ 127.580095] [<ffffffff810bc4e4>] ? clockevents_program_event+0x74/0x100 >>>> [ 127.580095] [<ffffffff8105591f>] __do_softirq+0xef/0x2e0 >>>> [ 127.580095] [<ffffffff81055d0e>] irq_exit+0x7e/0xa0 >>>> [ 127.580095] [<ffffffff816c2cda>] smp_apic_timer_interrupt+0x4a/0x60 >>>> [ 127.580095] [<ffffffff816c1a8a>] apic_timer_interrupt+0x6a/0x70 >>>> [ 127.580095] <EOI> >>>> [ 127.580095] [<ffffffff8100b830>] ? default_idle+0x20/0xe0 >>>> [ 127.580095] [<ffffffff8100c02e>] arch_cpu_idle+0x1e/0x30 >>>> [ 127.580095] [<ffffffff810a8770>] cpu_startup_entry+0xa0/0x260 >>>> [ 127.580095] [<ffffffff816a3357>] rest_init+0x77/0x80 >>>> [ 127.580095] [<ffffffff81d05e68>] start_kernel+0x3c9/0x3d6 >>>> [ 127.580095] [<ffffffff81d0589f>] ? repair_env_string+0x5a/0x5a >>>> [ 127.580095] [<ffffffff81d055b2>] x86_64_start_reservations+0x2a/0x2c >>>> [ 127.580095] [<ffffffff81d056a4>] x86_64_start_kernel+0xf0/0xf7 >>>> [ 127.580095] Code: 01 0f b7 73 08 48 c7 c7 e8 df 30 a0 31 c0 41 83 cd >>>> ff e8 7d 58 3b e1 eb 9f 66 90 0f 1f 44 00 00 48 8b 87 f8 01 00 00 55 48 >>>> 89 e5 <48> 8b 40 10 5d 48 2d 20 05 00 00 c3 0f 1f 40 00 0f 1f 44 00 00 >>>> [ 127.580095] RIP [<ffffffffa02f7c10>] >>>> iscsit_tpg_get_node_attrib+0x10/0x20 [iscsi_target_mod] >>>> [ 127.580095] RSP <ffff88003fc03dd8> >>>> [ 127.857183] ---[ end trace 4185931fc757e117 ]--- >>>> [ 127.857838] Kernel panic - not syncing: Fatal exception in interrupt >>>> [ 127.858779] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation >>>> range: 0xffffffff80000000-0xffffffff9fffffff) >>>> >>>> Thanks, >>>> Arshad >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe target-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html