Re: Kernel OOP - Writes across Multiple Connections within Session

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/26/2014 2:06 PM, Arshad Hussain wrote:
> Hi Nab,
> On 6/25/2014 6:21 PM, Arshad Hussain wrote:
>> On 6/20/2014 11:17 PM, Nicholas A. Bellinger wrote:
>>> On Fri, 2014-06-20 at 15:54 +0530, Arshad Hussain wrote:
>>>> Hi Nab,
>>>> On 6/20/2014 12:32 PM, Nicholas A. Bellinger wrote:
>>>>> On Thu, 2014-06-19 at 23:43 -0700, Nicholas A. Bellinger wrote:
>>>>>> On Fri, 2014-06-20 at 11:30 +0530, Santosh Kulkarni wrote:
>>> <SNIP>
>>>
>>>>> One other quick thing to try is the following patch to dump the Data-out
>>>>> payload when a ITT received is not associated with a WRITE (following
>>>>> the log), instead of generating a Reject for the existing ITT.
>>>>>
>>>>> I can't quite tell from wireshark output why your test is sending
>>>>> Data-out with an ITT that is associated with a READ that has not been
>>>>> acknowledged by StatSN, but the following should at least leave the READ
>>>>> ITT alone.
>>>>>
>>>>> --nab
>>>>>
>>>>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>>>>> index b87721a..049fe14 100644
>>>>> --- a/drivers/target/iscsi/iscsi_target.c
>>>>> +++ b/drivers/target/iscsi/iscsi_target.c
>>>>> @@ -1308,7 +1308,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, unsigned char *buf,
>>>>>  	if (cmd->data_direction != DMA_TO_DEVICE) {
>>>>>  		pr_err("Command ITT: 0x%08x received DataOUT for a"
>>>>>  			" NON-WRITE command.\n", cmd->init_task_tag);
>>>>> -		return iscsit_reject_cmd(cmd, ISCSI_REASON_PROTOCOL_ERROR, buf);
>>>>> +		return iscsit_dump_data_payload(conn, payload_length, 1);
>>>>>  	}
>>>>>  	se_cmd = &cmd->se_cmd;
>>>>>  	iscsit_mod_dataout_timer(cmd); 
>>>>>
>>>>>
>>>> Santosh is traveling and I will help out with any answers you need.
>>>>
>>>> The ITT is unique within a session, should READ here be a problem.?  And
>>>> what we have observed that our the crash is intermittent as we have
>>>> mention before.
>>>>
>>> That is not what is indicated in the logs so far.  The output (appears)
>>> to show that an ITT that was previously used by a READ is being reused
>>> by a WRITE + associated Data-Out, before the original READ is explicitly
>>> acknowledged via StatSN.
>>>
>>> This results in the lookup for the Data-out ITT to locate the original
>>> READ, and correctly fail due to the incorrect data_direction type of the
>>> original command.
>>>
>>>> Tried out your patch on 3.14.0-rc6+ and 3.15.0-rc3+.  With 3.15.0-rc3
>>>> the target is not crashing. The dmesg shows the line below.
>>>>
>>>> [root@wfsc iscsi]# dmesg -c
>>>> [  210.623207] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE
>>>> command.
>>>>
>>> Thanks for confirming.  I think this is the correct fix, but would like
>>> to verify some more on my end.  Please send me the test off-list.
> This is what we are trying to achieve. Read from socket1 and socket2 and
> then write to socket1 and socket2. I have detailed the flow below. The
> code is modified to read and write with different ITT. And this work
> just fine. It is only when we read & write on two different sockets with
> same ITT it fails.
>
> Please notice after READ we are receiving stat_sn and incrementing the
> counters. This indicates that the stat_sn is being acknowledge. IMHO,
> the reads and writes on multiple sockets with same ITT should work. 
> While we can change the test case to make it pass, by reading and
> writing on different ITT. This however will shadow a potential bug in
> target. As always your valuable comments.
>
> READ10 CMD [socket1,ITT=A3A3A3A3] -> [T]
> /* Update the counters. At this point stat_sn is ack. */
> next_cmdsn = recvpdu->exp_cmd_sn;
> next_expstatsn = (recvpdu->stat_sn) + 0x01;
> maxcmdsn = recvpdu->max_cmd_sn;
>
> READ10 CMD [socket2,ITT=A3A3A3A3] -> [T]
> /* Update counters. At this point stat_sn is ack.*/
> next_cmdsn = recvpdu->exp_cmd_sn;
> next_expstatsn = (recvpdu->stat_sn) + 0x01;
> maxcmdsn = recvpdu->max_cmd_sn;
>
> WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T]
> R2T <- [T]
> DATA OUT [socket1,ITT=A5A5A5A5] -> [T]
>
> WRITE10 CMD [socket2,ITT=A5A5A5A5] -> [T]
> R2T <- [T]
> DATA OUT [socket1,ITT=A5A5A5A5] -> [T]
>
> Thanks!
> Arshad

Just tried another set of cases with multiple connections.

// First Write to socket1
WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T]
R2T <- [T]
DATA OUT [socket1,ITT=A5A5A5A5] -> [T]

// Second write to socked2, with different ITT. This works.
WRITE10 CMD [socket2,ITT=A3A3A3A3] -> [T]
R2T <- [T]
DATA OUT [socket1,ITT=A3A3A3A3] -> [T]

// Third write again to socket1. ITT reuse. Please Note the writes are
ack at this point.
WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T]
R2T <- [T]

Observation: First and second write works. The third write fails (No
crash). Printing below the dmesg.

[16066.334410] Command ITT: 0xa5a5a5a5 received DataOUT after last
DataOUT received, dumping payload.

Thanks
Arshad

PS: Can you help me where to put BUG()/panic() as I am not being able to
provide you with bt. I cannot also get vmcore on crash (previous 3.14
without patch) since it is locking I am unable to enter into
dump-capture kernel. Overall, let me know what all more I can get you to
help you debug.


>>>> [root@wfsc iscsi]# uname -a
>>>> Linux wfsc 3.15.0-rc3+ #3 SMP Fri May 30 06:11:42 EDT 2014 x86_64 x86_64
>>>> x86_64 GNU/Linux
>>>>
>>>> With 3.14.0-rc6+. It is crashing with the below output.
>>>>
>>> This output for v3.14-rc6 would indicate that the patch is not applied.
>>> Eg:
>>>
>>>    "ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set."
>>>
>>> means that iscsi_dump_data_payload() is not being called, because the
>>> subsequent Data-Out payload (still in the TCP stream) is processed, and
>>> fails at the WRITE + FINAL bits sanity checks.
>>>
>>> Please confirm this patch is applied with a printk() ahead of the call
>>> to iscsi_dump_data_payload().
>> Apologies for the delay.
>>
>> I double check your patch on  3.14.0-rc6+. It is applied and I have
>> added debug statement as you have advised to see if we are hitting the
>> correct area. I have also attached a .png, as I was getting a soft lock.
>> This is on same kernel (3.14.0-rc6+), but without your patch.  I will
>> also revisit our test plan and get back.
>>
>> <Dmesg output With your patch on 3.14.0-rc6+>
>> root@wfsc target-pending]# dmesg
>> [ 8915.579866] Debug#1
>> [ 8915.607790] Debug#1
>> [ 8915.607862] Debug#4
>> [ 8915.607923] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE
>> command.
>> [root@wfsc target-pending]#
>>
>> diff --git a/drivers/target/iscsi/iscsi_target.c
>> b/drivers/target/iscsi/iscsi_target.c
>> index 7e5469a..f941bac 100644
>> --- a/drivers/target/iscsi/iscsi_target.c
>> +++ b/drivers/target/iscsi/iscsi_target.c
>> @@ -1252,6 +1252,8 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
>> unsigned char *buf,
>>         struct se_cmd *se_cmd;
>>         u32 payload_length = ntoh24(hdr->dlength);
>>         int rc;
>> +
>> +       pr_err("Debug#1\n");
>>
>>         if (!payload_length) {
>>                 pr_warn("DataOUT payload is ZERO, ignoring.\n");
>> @@ -1262,6 +1264,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
>> unsigned char *buf,
>>         atomic_long_add(payload_length, &conn->sess->rx_data_octets);
>>
>>         if (payload_length > conn->conn_ops->MaxXmitDataSegmentLength) {
>> +       pr_err("Debug#2\n");
>>                 pr_err("DataSegmentLength: %u is greater than"
>>                         " MaxXmitDataSegmentLength: %u\n", payload_length,
>>                         conn->conn_ops->MaxXmitDataSegmentLength);
>> @@ -1280,6 +1283,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
>> unsigned char *buf,
>>                 payload_length, conn->cid);
>>
>>         if (cmd->cmd_flags & ICF_GOT_LAST_DATAOUT) {
>> +       pr_err("Debug#3\n");
>>                 pr_err("Command ITT: 0x%08x received DataOUT after"
>>                         " last DataOUT received, dumping payload\n",
>>                         cmd->init_task_tag);
>> @@ -1287,9 +1291,11 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
>> unsigned char *buf,
>>         }
>>
>>         if (cmd->data_direction != DMA_TO_DEVICE) {
>> +       pr_err("Debug#4\n");
>>                 pr_err("Command ITT: 0x%08x received DataOUT for a"
>>                         " NON-WRITE command.\n", cmd->init_task_tag);
>> -               return iscsit_reject_cmd(cmd,
>> ISCSI_REASON_PROTOCOL_ERROR, buf);
>> +               return iscsit_dump_data_payload(conn, payload_length, 1);
>>         }
>>         se_cmd = &cmd->se_cmd;
>>         iscsit_mod_dataout_timer(cmd);
>> @@ -1564,7 +1570,10 @@ int iscsit_process_nop_out(struct iscsi_conn
>> *conn, struct iscsi_cmd *cmd,
>>          * Initiator is expecting a NopIN ping reply..
>>          */
>>         if (hdr->itt != RESERVED_ITT) {
>> -               BUG_ON(!cmd);
>> +               if (!cmd)
>> +                       return iscsit_add_reject(conn,
>> ISCSI_REASON_PROTOCOL_ERROR,
>> +                                                                      
>> (unsigned char *)hdr);
>>
>>                 spin_lock_bh(&conn->cmd_lock);
>>                 list_add_tail(&cmd->i_conn_node, &conn->conn_cmd_list);
>> [root@wfsc target-pending]#
>>
>> Thanks,
>> Arshad
>>> Thanks,
>>>
>>> --nab
>>>
>>>> [  124.571009] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE
>>>> command.
>>>> [  124.623127] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  124.632252] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  124.676209] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  124.736480] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  124.924880] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  124.940144] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  125.040862] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  125.050377] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  125.066584] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  125.078853] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  125.081819] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>>> iSCSI Initiator.
>>>> [  127.576143] general protection fault: 0000 [#1] SMP
>>>> [  127.580095] Modules linked in: nls_utf8 isofs vboxsf(O) tcm_loop
>>>> tcm_fc libfc scsi_transport_fc scsi_tgt iscsi_target_mod
>>>> target_core_pscsi target_core_file target_core_iblock target_core_mod
>>>> configfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
>>>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bnep rfcomm
>>>> bluetooth snd_intel8x0 snd_ac97_codec 6lowpan_iphc ac97_bus snd_pcm
>>>> snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
>>>> snd_seq_device snd joydev vboxguest(O) psmouse mac_hid i2c_piix4 ppdev
>>>> serio_raw soundcore parport_pc lp parport hid_generic usbhid hid e1000
>>>> [  127.580095] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
>>>> 3.14.0-rc6+ #4
>>>> [  127.580095] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>> VirtualBox 12/01/2006
>>>> [  127.580095] task: ffffffff81c104a0 ti: ffffffff81c00000 task.ti:
>>>> ffffffff81c00000
>>>> [  127.580095] RIP: 0010:[<ffffffffa02f7c10>]  [<ffffffffa02f7c10>]
>>>> iscsit_tpg_get_node_attrib+0x10/0x20 [iscsi_target_mod]
>>>> [  127.580095] RSP: 0018:ffff88003fc03dd8  EFLAGS: 00010246
>>>> [  127.580095] RAX: 617941580000000d RBX: ffff880037baa640 RCX:
>>>> 0000000000000000
>>>> [  127.580095] RDX: 0000000000000002 RSI: 0000000000000200 RDI:
>>>> ffff88001a1d2800
>>>> [  127.580095] RBP: ffff88003fc03dd8 R08: 0000000000016ab1 R09:
>>>> 0000000081e5b240
>>>> [  127.580095] R10: 0000000000000020 R11: 0000000000000000 R12:
>>>> ffff880037b65800
>>>> [  127.580095] R13: ffff880037baa724 R14: ffff88001a1d2800 R15:
>>>> ffffffffa02f17d0
>>>> [  127.580095] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
>>>> knlGS:0000000000000000
>>>> [  127.580095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [  127.580095] CR2: 00007fe77c00b208 CR3: 000000002fdad000 CR4:
>>>> 00000000000006f0
>>>> [  127.580095] Stack:
>>>> [  127.580095]  ffff88003fc03e28 ffffffffa02f183b 0000000000000000
>>>> 0000000000000000
>>>> [  127.580095]  ffff88003fc0d1a0 0000000000000000 0000000000000100
>>>> ffffffffa02f17d0
>>>> [  127.580095]  ffff880037baa758 ffffffff81e5c268 ffff88003fc03e78
>>>> ffffffff8105cc15
>>>> [  127.580095] Call Trace:
>>>> [  127.580095]  <IRQ>
>>>> [  127.580095]  [<ffffffffa02f183b>]
>>>> iscsit_handle_dataout_timeout+0x6b/0x250 [iscsi_target_mod]
>>>> [  127.580095]  [<ffffffffa02f17d0>] ?
>>>> iscsit_start_dataout_timer+0xa0/0xa0 [iscsi_target_mod]
>>>> [  127.580095]  [<ffffffff8105cc15>] call_timer_fn+0x45/0x160
>>>> [  127.580095]  [<ffffffffa02f17d0>] ?
>>>> iscsit_start_dataout_timer+0xa0/0xa0 [iscsi_target_mod]
>>>> [  127.580095]  [<ffffffff8105dfa4>] run_timer_softirq+0x244/0x290
>>>> [  127.580095]  [<ffffffff810bc4e4>] ? clockevents_program_event+0x74/0x100
>>>> [  127.580095]  [<ffffffff8105591f>] __do_softirq+0xef/0x2e0
>>>> [  127.580095]  [<ffffffff81055d0e>] irq_exit+0x7e/0xa0
>>>> [  127.580095]  [<ffffffff816c2cda>] smp_apic_timer_interrupt+0x4a/0x60
>>>> [  127.580095]  [<ffffffff816c1a8a>] apic_timer_interrupt+0x6a/0x70
>>>> [  127.580095]  <EOI>
>>>> [  127.580095]  [<ffffffff8100b830>] ? default_idle+0x20/0xe0
>>>> [  127.580095]  [<ffffffff8100c02e>] arch_cpu_idle+0x1e/0x30
>>>> [  127.580095]  [<ffffffff810a8770>] cpu_startup_entry+0xa0/0x260
>>>> [  127.580095]  [<ffffffff816a3357>] rest_init+0x77/0x80
>>>> [  127.580095]  [<ffffffff81d05e68>] start_kernel+0x3c9/0x3d6
>>>> [  127.580095]  [<ffffffff81d0589f>] ? repair_env_string+0x5a/0x5a
>>>> [  127.580095]  [<ffffffff81d055b2>] x86_64_start_reservations+0x2a/0x2c
>>>> [  127.580095]  [<ffffffff81d056a4>] x86_64_start_kernel+0xf0/0xf7
>>>> [  127.580095] Code: 01 0f b7 73 08 48 c7 c7 e8 df 30 a0 31 c0 41 83 cd
>>>> ff e8 7d 58 3b e1 eb 9f 66 90 0f 1f 44 00 00 48 8b 87 f8 01 00 00 55 48
>>>> 89 e5 <48> 8b 40 10 5d 48 2d 20 05 00 00 c3 0f 1f 40 00 0f 1f 44 00 00
>>>> [  127.580095] RIP  [<ffffffffa02f7c10>]
>>>> iscsit_tpg_get_node_attrib+0x10/0x20 [iscsi_target_mod]
>>>> [  127.580095]  RSP <ffff88003fc03dd8>
>>>> [  127.857183] ---[ end trace 4185931fc757e117 ]---
>>>> [  127.857838] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [  127.858779] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation
>>>> range: 0xffffffff80000000-0xffffffff9fffffff)
>>>>
>>>> Thanks,
>>>> Arshad
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux