Re: Kernel OOP - Writes across Multiple Connections within Session

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nab,
On 6/25/2014 6:21 PM, Arshad Hussain wrote:
> On 6/20/2014 11:17 PM, Nicholas A. Bellinger wrote:
>> On Fri, 2014-06-20 at 15:54 +0530, Arshad Hussain wrote:
>>> Hi Nab,
>>> On 6/20/2014 12:32 PM, Nicholas A. Bellinger wrote:
>>>> On Thu, 2014-06-19 at 23:43 -0700, Nicholas A. Bellinger wrote:
>>>>> On Fri, 2014-06-20 at 11:30 +0530, Santosh Kulkarni wrote:
>> <SNIP>
>>
>>>> One other quick thing to try is the following patch to dump the Data-out
>>>> payload when a ITT received is not associated with a WRITE (following
>>>> the log), instead of generating a Reject for the existing ITT.
>>>>
>>>> I can't quite tell from wireshark output why your test is sending
>>>> Data-out with an ITT that is associated with a READ that has not been
>>>> acknowledged by StatSN, but the following should at least leave the READ
>>>> ITT alone.
>>>>
>>>> --nab
>>>>
>>>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>>>> index b87721a..049fe14 100644
>>>> --- a/drivers/target/iscsi/iscsi_target.c
>>>> +++ b/drivers/target/iscsi/iscsi_target.c
>>>> @@ -1308,7 +1308,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn, unsigned char *buf,
>>>>  	if (cmd->data_direction != DMA_TO_DEVICE) {
>>>>  		pr_err("Command ITT: 0x%08x received DataOUT for a"
>>>>  			" NON-WRITE command.\n", cmd->init_task_tag);
>>>> -		return iscsit_reject_cmd(cmd, ISCSI_REASON_PROTOCOL_ERROR, buf);
>>>> +		return iscsit_dump_data_payload(conn, payload_length, 1);
>>>>  	}
>>>>  	se_cmd = &cmd->se_cmd;
>>>>  	iscsit_mod_dataout_timer(cmd); 
>>>>
>>>>
>>> Santosh is traveling and I will help out with any answers you need.
>>>
>>> The ITT is unique within a session, should READ here be a problem.?  And
>>> what we have observed that our the crash is intermittent as we have
>>> mention before.
>>>
>> That is not what is indicated in the logs so far.  The output (appears)
>> to show that an ITT that was previously used by a READ is being reused
>> by a WRITE + associated Data-Out, before the original READ is explicitly
>> acknowledged via StatSN.
>>
>> This results in the lookup for the Data-out ITT to locate the original
>> READ, and correctly fail due to the incorrect data_direction type of the
>> original command.
>>
>>> Tried out your patch on 3.14.0-rc6+ and 3.15.0-rc3+.  With 3.15.0-rc3
>>> the target is not crashing. The dmesg shows the line below.
>>>
>>> [root@wfsc iscsi]# dmesg -c
>>> [  210.623207] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE
>>> command.
>>>
>> Thanks for confirming.  I think this is the correct fix, but would like
>> to verify some more on my end.  Please send me the test off-list.

This is what we are trying to achieve. Read from socket1 and socket2 and
then write to socket1 and socket2. I have detailed the flow below. The
code is modified to read and write with different ITT. And this work
just fine. It is only when we read & write on two different sockets with
same ITT it fails.

Please notice after READ we are receiving stat_sn and incrementing the
counters. This indicates that the stat_sn is being acknowledge. IMHO,
the reads and writes on multiple sockets with same ITT should work. 
While we can change the test case to make it pass, by reading and
writing on different ITT. This however will shadow a potential bug in
target. As always your valuable comments.

READ10 CMD [socket1,ITT=A3A3A3A3] -> [T]
/* Update the counters. At this point stat_sn is ack. */
next_cmdsn = recvpdu->exp_cmd_sn;
next_expstatsn = (recvpdu->stat_sn) + 0x01;
maxcmdsn = recvpdu->max_cmd_sn;

READ10 CMD [socket2,ITT=A3A3A3A3] -> [T]
/* Update counters. At this point stat_sn is ack.*/
next_cmdsn = recvpdu->exp_cmd_sn;
next_expstatsn = (recvpdu->stat_sn) + 0x01;
maxcmdsn = recvpdu->max_cmd_sn;

WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T]
R2T <- [T]
DATA OUT [socket1,ITT=A5A5A5A5] -> [T]

WRITE10 CMD [socket2,ITT=A5A5A5A5] -> [T]
R2T <- [T]
DATA OUT [socket1,ITT=A5A5A5A5] -> [T]

Thanks!
Arshad
>>
>>> [root@wfsc iscsi]# uname -a
>>> Linux wfsc 3.15.0-rc3+ #3 SMP Fri May 30 06:11:42 EDT 2014 x86_64 x86_64
>>> x86_64 GNU/Linux
>>>
>>> With 3.14.0-rc6+. It is crashing with the below output.
>>>
>> This output for v3.14-rc6 would indicate that the patch is not applied.
>> Eg:
>>
>>    "ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set."
>>
>> means that iscsi_dump_data_payload() is not being called, because the
>> subsequent Data-Out payload (still in the TCP stream) is processed, and
>> fails at the WRITE + FINAL bits sanity checks.
>>
>> Please confirm this patch is applied with a printk() ahead of the call
>> to iscsi_dump_data_payload().
> Apologies for the delay.
>
> I double check your patch on  3.14.0-rc6+. It is applied and I have
> added debug statement as you have advised to see if we are hitting the
> correct area. I have also attached a .png, as I was getting a soft lock.
> This is on same kernel (3.14.0-rc6+), but without your patch.  I will
> also revisit our test plan and get back.
>
> <Dmesg output With your patch on 3.14.0-rc6+>
> root@wfsc target-pending]# dmesg
> [ 8915.579866] Debug#1
> [ 8915.607790] Debug#1
> [ 8915.607862] Debug#4
> [ 8915.607923] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE
> command.
> [root@wfsc target-pending]#
>
> diff --git a/drivers/target/iscsi/iscsi_target.c
> b/drivers/target/iscsi/iscsi_target.c
> index 7e5469a..f941bac 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -1252,6 +1252,8 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
> unsigned char *buf,
>         struct se_cmd *se_cmd;
>         u32 payload_length = ntoh24(hdr->dlength);
>         int rc;
> +
> +       pr_err("Debug#1\n");
>
>         if (!payload_length) {
>                 pr_warn("DataOUT payload is ZERO, ignoring.\n");
> @@ -1262,6 +1264,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
> unsigned char *buf,
>         atomic_long_add(payload_length, &conn->sess->rx_data_octets);
>
>         if (payload_length > conn->conn_ops->MaxXmitDataSegmentLength) {
> +       pr_err("Debug#2\n");
>                 pr_err("DataSegmentLength: %u is greater than"
>                         " MaxXmitDataSegmentLength: %u\n", payload_length,
>                         conn->conn_ops->MaxXmitDataSegmentLength);
> @@ -1280,6 +1283,7 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
> unsigned char *buf,
>                 payload_length, conn->cid);
>
>         if (cmd->cmd_flags & ICF_GOT_LAST_DATAOUT) {
> +       pr_err("Debug#3\n");
>                 pr_err("Command ITT: 0x%08x received DataOUT after"
>                         " last DataOUT received, dumping payload\n",
>                         cmd->init_task_tag);
> @@ -1287,9 +1291,11 @@ iscsit_check_dataout_hdr(struct iscsi_conn *conn,
> unsigned char *buf,
>         }
>
>         if (cmd->data_direction != DMA_TO_DEVICE) {
> +       pr_err("Debug#4\n");
>                 pr_err("Command ITT: 0x%08x received DataOUT for a"
>                         " NON-WRITE command.\n", cmd->init_task_tag);
> -               return iscsit_reject_cmd(cmd,
> ISCSI_REASON_PROTOCOL_ERROR, buf);
> +               return iscsit_dump_data_payload(conn, payload_length, 1);
>         }
>         se_cmd = &cmd->se_cmd;
>         iscsit_mod_dataout_timer(cmd);
> @@ -1564,7 +1570,10 @@ int iscsit_process_nop_out(struct iscsi_conn
> *conn, struct iscsi_cmd *cmd,
>          * Initiator is expecting a NopIN ping reply..
>          */
>         if (hdr->itt != RESERVED_ITT) {
> -               BUG_ON(!cmd);
> +               if (!cmd)
> +                       return iscsit_add_reject(conn,
> ISCSI_REASON_PROTOCOL_ERROR,
> +                                                                      
> (unsigned char *)hdr);
>
>                 spin_lock_bh(&conn->cmd_lock);
>                 list_add_tail(&cmd->i_conn_node, &conn->conn_cmd_list);
> [root@wfsc target-pending]#
>
> Thanks,
> Arshad
>> Thanks,
>>
>> --nab
>>
>>> [  124.571009] Command ITT: 0xa3a3a3a3 received DataOUT for a NON-WRITE
>>> command.
>>> [  124.623127] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  124.632252] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  124.676209] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  124.736480] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  124.924880] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  124.940144] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  125.040862] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  125.050377] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  125.066584] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  125.078853] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  125.081819] ISCSI_FLAG_CMD_WRITE & ISCSI_FLAG_CMD_FINAL not set. Bad
>>> iSCSI Initiator.
>>> [  127.576143] general protection fault: 0000 [#1] SMP
>>> [  127.580095] Modules linked in: nls_utf8 isofs vboxsf(O) tcm_loop
>>> tcm_fc libfc scsi_transport_fc scsi_tgt iscsi_target_mod
>>> target_core_pscsi target_core_file target_core_iblock target_core_mod
>>> configfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
>>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bnep rfcomm
>>> bluetooth snd_intel8x0 snd_ac97_codec 6lowpan_iphc ac97_bus snd_pcm
>>> snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
>>> snd_seq_device snd joydev vboxguest(O) psmouse mac_hid i2c_piix4 ppdev
>>> serio_raw soundcore parport_pc lp parport hid_generic usbhid hid e1000
>>> [  127.580095] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
>>> 3.14.0-rc6+ #4
>>> [  127.580095] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>> VirtualBox 12/01/2006
>>> [  127.580095] task: ffffffff81c104a0 ti: ffffffff81c00000 task.ti:
>>> ffffffff81c00000
>>> [  127.580095] RIP: 0010:[<ffffffffa02f7c10>]  [<ffffffffa02f7c10>]
>>> iscsit_tpg_get_node_attrib+0x10/0x20 [iscsi_target_mod]
>>> [  127.580095] RSP: 0018:ffff88003fc03dd8  EFLAGS: 00010246
>>> [  127.580095] RAX: 617941580000000d RBX: ffff880037baa640 RCX:
>>> 0000000000000000
>>> [  127.580095] RDX: 0000000000000002 RSI: 0000000000000200 RDI:
>>> ffff88001a1d2800
>>> [  127.580095] RBP: ffff88003fc03dd8 R08: 0000000000016ab1 R09:
>>> 0000000081e5b240
>>> [  127.580095] R10: 0000000000000020 R11: 0000000000000000 R12:
>>> ffff880037b65800
>>> [  127.580095] R13: ffff880037baa724 R14: ffff88001a1d2800 R15:
>>> ffffffffa02f17d0
>>> [  127.580095] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
>>> knlGS:0000000000000000
>>> [  127.580095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [  127.580095] CR2: 00007fe77c00b208 CR3: 000000002fdad000 CR4:
>>> 00000000000006f0
>>> [  127.580095] Stack:
>>> [  127.580095]  ffff88003fc03e28 ffffffffa02f183b 0000000000000000
>>> 0000000000000000
>>> [  127.580095]  ffff88003fc0d1a0 0000000000000000 0000000000000100
>>> ffffffffa02f17d0
>>> [  127.580095]  ffff880037baa758 ffffffff81e5c268 ffff88003fc03e78
>>> ffffffff8105cc15
>>> [  127.580095] Call Trace:
>>> [  127.580095]  <IRQ>
>>> [  127.580095]  [<ffffffffa02f183b>]
>>> iscsit_handle_dataout_timeout+0x6b/0x250 [iscsi_target_mod]
>>> [  127.580095]  [<ffffffffa02f17d0>] ?
>>> iscsit_start_dataout_timer+0xa0/0xa0 [iscsi_target_mod]
>>> [  127.580095]  [<ffffffff8105cc15>] call_timer_fn+0x45/0x160
>>> [  127.580095]  [<ffffffffa02f17d0>] ?
>>> iscsit_start_dataout_timer+0xa0/0xa0 [iscsi_target_mod]
>>> [  127.580095]  [<ffffffff8105dfa4>] run_timer_softirq+0x244/0x290
>>> [  127.580095]  [<ffffffff810bc4e4>] ? clockevents_program_event+0x74/0x100
>>> [  127.580095]  [<ffffffff8105591f>] __do_softirq+0xef/0x2e0
>>> [  127.580095]  [<ffffffff81055d0e>] irq_exit+0x7e/0xa0
>>> [  127.580095]  [<ffffffff816c2cda>] smp_apic_timer_interrupt+0x4a/0x60
>>> [  127.580095]  [<ffffffff816c1a8a>] apic_timer_interrupt+0x6a/0x70
>>> [  127.580095]  <EOI>
>>> [  127.580095]  [<ffffffff8100b830>] ? default_idle+0x20/0xe0
>>> [  127.580095]  [<ffffffff8100c02e>] arch_cpu_idle+0x1e/0x30
>>> [  127.580095]  [<ffffffff810a8770>] cpu_startup_entry+0xa0/0x260
>>> [  127.580095]  [<ffffffff816a3357>] rest_init+0x77/0x80
>>> [  127.580095]  [<ffffffff81d05e68>] start_kernel+0x3c9/0x3d6
>>> [  127.580095]  [<ffffffff81d0589f>] ? repair_env_string+0x5a/0x5a
>>> [  127.580095]  [<ffffffff81d055b2>] x86_64_start_reservations+0x2a/0x2c
>>> [  127.580095]  [<ffffffff81d056a4>] x86_64_start_kernel+0xf0/0xf7
>>> [  127.580095] Code: 01 0f b7 73 08 48 c7 c7 e8 df 30 a0 31 c0 41 83 cd
>>> ff e8 7d 58 3b e1 eb 9f 66 90 0f 1f 44 00 00 48 8b 87 f8 01 00 00 55 48
>>> 89 e5 <48> 8b 40 10 5d 48 2d 20 05 00 00 c3 0f 1f 40 00 0f 1f 44 00 00
>>> [  127.580095] RIP  [<ffffffffa02f7c10>]
>>> iscsit_tpg_get_node_attrib+0x10/0x20 [iscsi_target_mod]
>>> [  127.580095]  RSP <ffff88003fc03dd8>
>>> [  127.857183] ---[ end trace 4185931fc757e117 ]---
>>> [  127.857838] Kernel panic - not syncing: Fatal exception in interrupt
>>> [  127.858779] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation
>>> range: 0xffffffff80000000-0xffffffff9fffffff)
>>>
>>> Thanks,
>>> Arshad
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux