Re: SNACK-R2T - RunLength > 0 Crashing Target

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/29/2014 2:44 PM, Nicholas A. Bellinger wrote:
On Tue, 2014-01-28 at 12:22 -0800, Nicholas A. Bellinger wrote:
On Tue, 2014-01-28 at 13:01 +0530, Arshad Hussain wrote:
Hi Nicholas,

<SNIP>

I tried out you patch. I ran 3 cases with it. The case with runlength>0
and without
htonl() conversion still crashes the machine.

1) Case 1 : begrun=0,runlength=0. It works.
Jan 28 01:35:10 wfs kernel: [  136.416906] Debug #1 begrun: 0 runlength:
0 cmd->r2t_sn: 1
Jan 28 01:35:10 wfs kernel: [  136.416907] Debug #2 last_r2tsn: 1

2) Case 2: begrun=0 and (RunLength >0 ) However, it is converted via
htonl(1). It works.
Jan 28 01:49:02 wfs kernel: [  968.100236] Debug #1 begrun: 0 runlength:
1 cmd->r2t_sn: 1
Jan 28 01:49:02 wfs kernel: [  968.100237] Debug #2 last_r2tsn: 1

3) Case 3: begrun=0 and Runlength=1. No conversion done with htonl. It
dump_stack().
Attached it the screen shot of the dump_stack().  It looks like is
hitting the new case (last_r2tsn>0x0000FFFF)
you have put in.

So without full dmesg output it's kinda hard to tell.  ;)

In any event, I'm pretty sure it's hitting the same original sanity
check in iscsit_handle_r2t_snack() mentioned above for bogus RunLengths,
and it's the generation of the REJECT that is triggering the bug which
is likely a >= v3.10 regression.

I'd really like to get this resolved, and the quickest way to do that
would be to send a copy of your test-case code (offlist).  Otherwise,
I'll have to put together something similar.

So after generating a few large SNACK RunLengths within
iscsit_handle_recovery_datain_or_r2t() code, this OOPs is reproducible
for me in ERL=2 operation.

It's occurring when iscsit_build_conn_drop_async_message() is called
during connection recovery to send a ASYNC PDU w/ error status for the
failed CID on (any available) active connection.  This is hitting a
iscsi_session NULL pointer dereference in iscsit_allocate_cmd(), which
has been added for >= v3.12 to obtain a pre-allocated percpu_ida tag.

The following fix avoids NULL session pointers in
iscsit_build_conn_drop_async_message() and should handle all SNACK
errors generating REJECT PDUs + causing connection recovery to occur.

Btw, I'm still looking at this code, but please go ahead and give this
first patch a shot.

Thanks,

--nab

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 1ab9b18..4123893 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -2484,6 +2498,12 @@ static void iscsit_build_conn_drop_async_message(struct iscsi_conn *conn)
         if (!conn_p)
                 return;
+ if (!conn_p->sess) {
+               dump_stack();
+               iscsit_dec_conn_usage_count(conn_p);
+               return;
+       }
+
         cmd = iscsit_allocate_cmd(conn_p, TASK_RUNNING);
         if (!cmd) {
                 iscsit_dec_conn_usage_count(conn_p);




Your latest patch does not crash the machine now. Converting from host to n/w order is the developer's onus, and this is fine. However , I would request you to
leave this message in you final merge as it is informative.

---snip--
[ 489.937479] Command ITT: 0xa3a3a3a3 received R2T SNACK with BegRun: 0x00000000, RunLength: 0x01000000, exceeds current R2TSN: 0x00000001, protocol error.
--snip end ---


Below are the result of the run against your latest patch. (messages and dmesg)

Thanks,
Arshad

1) Output from your earlier patch...
Message from syslogd@wfs at Jan 29 05:05:38 ...
 kernel:[  146.317394] Debug #0.1 begrun: 0 runlength: 16777216

Message from syslogd@wfs at Jan 29 05:05:38 ...
kernel:[ 146.317733] Debug #1 begrun: 0 runlength: 16777216 cmd->r2t_sn: 1

2) <snip> /var/log/messages
Jan 29 05:05:39 wfs iscsi[1020]: Starting iscsi: iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.linux-iscsi.wfs.x8664:sn.a6fd093081c0, portal: 10.0.2.25,3260]. Jan 29 05:05:39 wfs iscsi[1020]: iscsiadm: initiator reported error (8 - connection timed out)
Jan 29 05:05:39 wfs iscsi[1020]: iscsiadm: Could not log into all portals
Jan 29 05:05:39 wfs systemd[1]: Started LSB: Starts and stops login and scanning of iSCSI devices..
Jan 29 05:05:39 wfs systemd[1]: Starting Multi-User System.
Jan 29 05:05:39 wfs systemd[1]: Reached target Multi-User System.
Jan 29 05:05:39 wfs systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jan 29 05:05:39 wfs iscsi[1020]: [  OK  ]
Jan 29 05:05:39 wfs systemd[1]: Started Update UTMP about System Runlevel Changes. Jan 29 05:05:39 wfs systemd[1]: Startup finished in 739ms (kernel) + 2.236s (initrd) + 2min 24.135s (userspace) = 2min 27.110s. Jan 29 05:05:39 wfs iscsid: Connection-1:0 to [target: iqn.2003-01.org.linux-iscsi.wfs.x8664:sn.a6fd093081c0, portal: 10.0.2.25,3260] through [iface: default] is shutdown. Jan 29 05:05:58 wfs kernel: [ 166.368068] Time2Retain timer expired for SID: 2, cleaning up iSCSI session. Jan 29 05:05:58 wfs kernel: [ 166.369007] BUG: scheduling while atomic: swapper/0/0/0x10000100 Jan 29 05:05:58 wfs kernel: [ 166.369874] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod(F) target_core_mod nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 bnep bluetooth nf_conntrack_ipv4 nf_defrag_ipv4 rfkill xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vboxsf(OF) snd_intel8x0 snd_ac97_codec ppdev ac97_bus joydev vboxvideo(OF) drm pcspkr serio_raw snd_seq snd_seq_device snd_pcm microcode parport_pc snd_page_alloc parport snd_timer ata_generic pata_acpi i2c_piix4 i2c_core e1000 vboxguest(OF) snd soundcore uinput Jan 29 05:05:58 wfs kernel: [ 166.370081] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF O 3.12.6 #1 Jan 29 05:05:58 wfs kernel: [ 166.370261] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Jan 29 05:05:58 wfs kernel: [ 166.370316] ffff88003fc13000 ffff88003fc03c80 ffffffff8164dc00 00000000000400f6 Jan 29 05:05:58 wfs kernel: [ 166.370372] ffffffff81c00000 ffff88003fc03c90 ffffffff81649d5e ffff88003fc03d00 Jan 29 05:05:58 wfs kernel: [ 166.370428] ffffffff81652985 ffffffff81c10480 ffffffff81c01fd8 ffffffff81c01fd8
Jan 29 05:05:58 wfs kernel: [  166.370483] Call Trace:
Jan 29 05:05:58 wfs kernel: [ 166.370538] <IRQ> [<ffffffff8164dc00>] dump_stack+0x46/0x58 Jan 29 05:05:58 wfs kernel: [ 166.370602] [<ffffffff81649d5e>] __schedule_bug+0x48/0x56 Jan 29 05:05:58 wfs kernel: [ 166.370660] [<ffffffff81652985>] __schedule+0x785/0x800 Jan 29 05:05:58 wfs kernel: [ 166.370719] [<ffffffff8108449a>] __cond_resched+0x2a/0x40 Jan 29 05:05:58 wfs kernel: [ 166.370776] [<ffffffff81652a7f>] _cond_resched+0x2f/0x40 Jan 29 05:05:58 wfs kernel: [ 166.370833] [<ffffffff81651a66>] mutex_lock+0x16/0x40 Jan 29 05:05:58 wfs kernel: [ 166.370898] [<ffffffffa049660f>] iscsit_free_all_ooo_cmdsns+0x1f/0x80 [iscsi_target_mod] Jan 29 05:05:58 wfs kernel: [ 166.370961] [<ffffffffa0493f90>] ? iscsit_dataout_update_r2t.part.0+0x70/0x70 [iscsi_target_mod] Jan 29 05:05:58 wfs kernel: [ 166.371065] [<ffffffffa04a80c8>] iscsit_close_session+0xb8/0x230 [iscsi_target_mod] Jan 29 05:05:58 wfs kernel: [ 166.371130] [<ffffffffa04ac7e2>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod] Jan 29 05:05:58 wfs kernel: [ 166.371217] [<ffffffffa0439aae>] target_release_session+0x1e/0x20 [target_core_mod] Jan 29 05:05:58 wfs kernel: [ 166.371308] [<ffffffffa043b790>] target_put_session+0x40/0x50 [target_core_mod] Jan 29 05:05:58 wfs kernel: [ 166.371375] [<ffffffffa0494057>] iscsit_handle_time2retain_timeout+0xc7/0x100 [iscsi_target_mod] Jan 29 05:05:58 wfs kernel: [ 166.371434] [<ffffffff8106097a>] call_timer_fn+0x3a/0x110 Jan 29 05:05:58 wfs kernel: [ 166.371504] [<ffffffffa0493f90>] ? iscsit_dataout_update_r2t.part.0+0x70/0x70 [iscsi_target_mod] Jan 29 05:05:58 wfs kernel: [ 166.371563] [<ffffffff81061e30>] run_timer_softirq+0x1f0/0x2a0 Jan 29 05:05:58 wfs kernel: [ 166.371621] [<ffffffff81059ad0>] __do_softirq+0xe0/0x220 Jan 29 05:05:58 wfs kernel: [ 166.371678] [<ffffffff8165e35c>] call_softirq+0x1c/0x30 Jan 29 05:05:58 wfs kernel: [ 166.371744] [<ffffffff810045f5>] do_softirq+0x55/0x90 Jan 29 05:05:58 wfs kernel: [ 166.371801] [<ffffffff81059d85>] irq_exit+0xa5/0xb0 Jan 29 05:05:58 wfs kernel: [ 166.371858] [<ffffffff8165eaaa>] smp_apic_timer_interrupt+0x4a/0x60 Jan 29 05:05:58 wfs kernel: [ 166.371914] [<ffffffff8165d88a>] apic_timer_interrupt+0x6a/0x70 Jan 29 05:05:58 wfs kernel: [ 166.371968] <EOI> [<ffffffff8100afac>] ? default_idle+0x1c/0xb0 Jan 29 05:05:58 wfs kernel: [ 166.372060] [<ffffffff8100b7be>] arch_cpu_idle+0x1e/0x30 Jan 29 05:05:58 wfs kernel: [ 166.372127] [<ffffffff810a6cd1>] cpu_startup_entry+0xe1/0x260 Jan 29 05:05:58 wfs kernel: [ 166.372191] [<ffffffff81644b47>] rest_init+0x77/0x80 Jan 29 05:05:58 wfs kernel: [ 166.372253] [<ffffffff81d0ae8e>] start_kernel+0x3ee/0x3fb Jan 29 05:05:58 wfs kernel: [ 166.372310] [<ffffffff81d0a89f>] ? repair_env_string+0x5e/0x5e Jan 29 05:05:58 wfs kernel: [ 166.372378] [<ffffffff81d0a5a5>] x86_64_start_reservations+0x2a/0x2c Jan 29 05:05:58 wfs kernel: [ 166.372464] [<ffffffff81d0a69f>] x86_64_start_kernel+0xf8/0xfc
Jan 29 05:05:59 wfs sh[416]: abrt-dump-oops: Found oopses: 1
Jan 29 05:05:59 wfs sh[416]: abrt-dump-oops: Creating problem directories
Jan 29 05:06:00 wfs abrt-server[1342]: Looking for kernel package
Jan 29 05:06:00 wfs abrt-dump-oops: Reported 1 kernel oopses to Abrt
Jan 29 05:06:00 wfs abrt-server[1342]: Can't find kernel package corresponding to '3.12.6' Jan 29 05:06:00 wfs abrt-server[1342]: Can't record package version data (pkg_version, pkg_release, ...). Jan 29 05:06:00 wfs abrt-server[1342]: 'post-create' on '/var/tmp/abrt/oops-2014-01-29-05:05:59-1341-0' exited with 1 Jan 29 05:06:00 wfs abrt-server[1342]: Deleting problem directory '/var/tmp/abrt/oops-2014-01-29-05:05:59-1341-0'
^C
[root@wfs ~]#


3)dmesg

[  489.936888] Debug #1 begrun: 0 runlength: 16777216 cmd->r2t_sn: 1
[ 489.937479] Command ITT: 0xa3a3a3a3 received R2T SNACK with BegRun: 0x00000000, RunLength: 0x01000000, exceeds current R2TSN: 0x00000001, protocol error. [ 489.938421] CPU: 0 PID: 1112 Comm: iscsi_trx Tainted: GF W O 3.12.6 #1 [ 489.938423] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 489.938425] ffff88003c83fef8 ffff88003c7dbd08 ffffffff8164dc00 ffff88003c83fec8 [ 489.938428] ffff88003c7af800 ffff88003c7dbd68 ffffffffa04a888d ffff88003c7dbd48 [ 489.938429] ffffffffa04a2ae7 08ff88003c7dbd58 0000000000000000 ffff88003c85e480
[  489.938431] Call Trace:
[  489.938439]  [<ffffffff8164dc00>] dump_stack+0x46/0x58
[ 489.938452] [<ffffffffa04a888d>] iscsit_close_connection+0x64d/0x6a0 [iscsi_target_mod] [ 489.938457] [<ffffffffa04a2ae7>] ? iscsit_add_reject_from_cmd+0xa7/0x140 [iscsi_target_mod] [ 489.938462] [<ffffffffa04978a8>] iscsit_connection_recovery_transport_reset+0x18/0x20 [iscsi_target_mod] [ 489.938467] [<ffffffffa04950bc>] iscsit_take_action_for_connection_exit+0xbc/0x110 [iscsi_target_mod] [ 489.938472] [<ffffffffa04a59c5>] iscsi_target_rx_thread+0x1f5/0xf60 [iscsi_target_mod]
[  489.938475]  [<ffffffff8108a9cf>] ? __dequeue_entity+0x2f/0x50
[  489.938477]  [<ffffffff8108c180>] ? set_next_entity+0x80/0x90
[  489.938481]  [<ffffffff816525ce>] ? __schedule+0x3ce/0x800
[ 489.938485] [<ffffffffa04a57d0>] ? iscsi_target_tx_thread+0x230/0x230 [iscsi_target_mod]
[  489.938488]  [<ffffffff81076e80>] kthread+0xc0/0xd0
[  489.938490]  [<ffffffff81076dc0>] ? kthread_create_on_node+0x120/0x120
[  489.938492]  [<ffffffff8165cc2c>] ret_from_fork+0x7c/0xb0
[  489.938493]  [<ffffffff81076dc0>] ? kthread_create_on_node+0x120/0x120


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux