On 1/29/2014 2:44 PM, Nicholas A. Bellinger wrote:
On Tue, 2014-01-28 at 12:22 -0800, Nicholas A. Bellinger wrote:
On Tue, 2014-01-28 at 13:01 +0530, Arshad Hussain wrote:
Hi Nicholas,
<SNIP>
I tried out you patch. I ran 3 cases with it. The case with runlength>0
and without
htonl() conversion still crashes the machine.
1) Case 1 : begrun=0,runlength=0. It works.
Jan 28 01:35:10 wfs kernel: [ 136.416906] Debug #1 begrun: 0 runlength:
0 cmd->r2t_sn: 1
Jan 28 01:35:10 wfs kernel: [ 136.416907] Debug #2 last_r2tsn: 1
2) Case 2: begrun=0 and (RunLength >0 ) However, it is converted via
htonl(1). It works.
Jan 28 01:49:02 wfs kernel: [ 968.100236] Debug #1 begrun: 0 runlength:
1 cmd->r2t_sn: 1
Jan 28 01:49:02 wfs kernel: [ 968.100237] Debug #2 last_r2tsn: 1
3) Case 3: begrun=0 and Runlength=1. No conversion done with htonl. It
dump_stack().
Attached it the screen shot of the dump_stack(). It looks like is
hitting the new case (last_r2tsn>0x0000FFFF)
you have put in.
So without full dmesg output it's kinda hard to tell. ;)
In any event, I'm pretty sure it's hitting the same original sanity
check in iscsit_handle_r2t_snack() mentioned above for bogus RunLengths,
and it's the generation of the REJECT that is triggering the bug which
is likely a >= v3.10 regression.
I'd really like to get this resolved, and the quickest way to do that
would be to send a copy of your test-case code (offlist). Otherwise,
I'll have to put together something similar.
So after generating a few large SNACK RunLengths within
iscsit_handle_recovery_datain_or_r2t() code, this OOPs is reproducible
for me in ERL=2 operation.
It's occurring when iscsit_build_conn_drop_async_message() is called
during connection recovery to send a ASYNC PDU w/ error status for the
failed CID on (any available) active connection. This is hitting a
iscsi_session NULL pointer dereference in iscsit_allocate_cmd(), which
has been added for >= v3.12 to obtain a pre-allocated percpu_ida tag.
The following fix avoids NULL session pointers in
iscsit_build_conn_drop_async_message() and should handle all SNACK
errors generating REJECT PDUs + causing connection recovery to occur.
Btw, I'm still looking at this code, but please go ahead and give this
first patch a shot.
Thanks,
--nab
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 1ab9b18..4123893 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -2484,6 +2498,12 @@ static void iscsit_build_conn_drop_async_message(struct iscsi_conn *conn)
if (!conn_p)
return;
+ if (!conn_p->sess) {
+ dump_stack();
+ iscsit_dec_conn_usage_count(conn_p);
+ return;
+ }
+
cmd = iscsit_allocate_cmd(conn_p, TASK_RUNNING);
if (!cmd) {
iscsit_dec_conn_usage_count(conn_p);
Your latest patch does not crash the machine now. Converting from host
to n/w
order is the developer's onus, and this is fine. However , I would
request you to
leave this message in you final merge as it is informative.
---snip--
[ 489.937479] Command ITT: 0xa3a3a3a3 received R2T SNACK with BegRun:
0x00000000, RunLength: 0x01000000, exceeds current R2TSN: 0x00000001,
protocol error.
--snip end ---
Below are the result of the run against your latest patch. (messages and
dmesg)
Thanks,
Arshad
1) Output from your earlier patch...
Message from syslogd@wfs at Jan 29 05:05:38 ...
kernel:[ 146.317394] Debug #0.1 begrun: 0 runlength: 16777216
Message from syslogd@wfs at Jan 29 05:05:38 ...
kernel:[ 146.317733] Debug #1 begrun: 0 runlength: 16777216
cmd->r2t_sn: 1
2) <snip> /var/log/messages
Jan 29 05:05:39 wfs iscsi[1020]: Starting iscsi: iscsiadm: Could not
login to [iface: default, target:
iqn.2003-01.org.linux-iscsi.wfs.x8664:sn.a6fd093081c0, portal:
10.0.2.25,3260].
Jan 29 05:05:39 wfs iscsi[1020]: iscsiadm: initiator reported error (8 -
connection timed out)
Jan 29 05:05:39 wfs iscsi[1020]: iscsiadm: Could not log into all portals
Jan 29 05:05:39 wfs systemd[1]: Started LSB: Starts and stops login and
scanning of iSCSI devices..
Jan 29 05:05:39 wfs systemd[1]: Starting Multi-User System.
Jan 29 05:05:39 wfs systemd[1]: Reached target Multi-User System.
Jan 29 05:05:39 wfs systemd[1]: Starting Update UTMP about System
Runlevel Changes...
Jan 29 05:05:39 wfs iscsi[1020]: [ OK ]
Jan 29 05:05:39 wfs systemd[1]: Started Update UTMP about System
Runlevel Changes.
Jan 29 05:05:39 wfs systemd[1]: Startup finished in 739ms (kernel) +
2.236s (initrd) + 2min 24.135s (userspace) = 2min 27.110s.
Jan 29 05:05:39 wfs iscsid: Connection-1:0 to [target:
iqn.2003-01.org.linux-iscsi.wfs.x8664:sn.a6fd093081c0, portal:
10.0.2.25,3260] through [iface: default] is shutdown.
Jan 29 05:05:58 wfs kernel: [ 166.368068] Time2Retain timer expired for
SID: 2, cleaning up iSCSI session.
Jan 29 05:05:58 wfs kernel: [ 166.369007] BUG: scheduling while atomic:
swapper/0/0/0x10000100
Jan 29 05:05:58 wfs kernel: [ 166.369874] Modules linked in:
target_core_pscsi target_core_file target_core_iblock
iscsi_target_mod(F) target_core_mod nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 bnep
bluetooth nf_conntrack_ipv4 nf_defrag_ipv4 rfkill xt_conntrack
nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_mangle iptable_security iptable_raw be2iscsi
iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi
ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi vboxsf(OF) snd_intel8x0
snd_ac97_codec ppdev ac97_bus joydev vboxvideo(OF) drm pcspkr serio_raw
snd_seq snd_seq_device snd_pcm microcode parport_pc snd_page_alloc
parport snd_timer ata_generic pata_acpi i2c_piix4 i2c_core e1000
vboxguest(OF) snd soundcore uinput
Jan 29 05:05:58 wfs kernel: [ 166.370081] CPU: 0 PID: 0 Comm: swapper/0
Tainted: GF O 3.12.6 #1
Jan 29 05:05:58 wfs kernel: [ 166.370261] Hardware name: innotek GmbH
VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Jan 29 05:05:58 wfs kernel: [ 166.370316] ffff88003fc13000
ffff88003fc03c80 ffffffff8164dc00 00000000000400f6
Jan 29 05:05:58 wfs kernel: [ 166.370372] ffffffff81c00000
ffff88003fc03c90 ffffffff81649d5e ffff88003fc03d00
Jan 29 05:05:58 wfs kernel: [ 166.370428] ffffffff81652985
ffffffff81c10480 ffffffff81c01fd8 ffffffff81c01fd8
Jan 29 05:05:58 wfs kernel: [ 166.370483] Call Trace:
Jan 29 05:05:58 wfs kernel: [ 166.370538] <IRQ> [<ffffffff8164dc00>]
dump_stack+0x46/0x58
Jan 29 05:05:58 wfs kernel: [ 166.370602] [<ffffffff81649d5e>]
__schedule_bug+0x48/0x56
Jan 29 05:05:58 wfs kernel: [ 166.370660] [<ffffffff81652985>]
__schedule+0x785/0x800
Jan 29 05:05:58 wfs kernel: [ 166.370719] [<ffffffff8108449a>]
__cond_resched+0x2a/0x40
Jan 29 05:05:58 wfs kernel: [ 166.370776] [<ffffffff81652a7f>]
_cond_resched+0x2f/0x40
Jan 29 05:05:58 wfs kernel: [ 166.370833] [<ffffffff81651a66>]
mutex_lock+0x16/0x40
Jan 29 05:05:58 wfs kernel: [ 166.370898] [<ffffffffa049660f>]
iscsit_free_all_ooo_cmdsns+0x1f/0x80 [iscsi_target_mod]
Jan 29 05:05:58 wfs kernel: [ 166.370961] [<ffffffffa0493f90>] ?
iscsit_dataout_update_r2t.part.0+0x70/0x70 [iscsi_target_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371065] [<ffffffffa04a80c8>]
iscsit_close_session+0xb8/0x230 [iscsi_target_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371130] [<ffffffffa04ac7e2>]
lio_tpg_close_session+0x12/0x20 [iscsi_target_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371217] [<ffffffffa0439aae>]
target_release_session+0x1e/0x20 [target_core_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371308] [<ffffffffa043b790>]
target_put_session+0x40/0x50 [target_core_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371375] [<ffffffffa0494057>]
iscsit_handle_time2retain_timeout+0xc7/0x100 [iscsi_target_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371434] [<ffffffff8106097a>]
call_timer_fn+0x3a/0x110
Jan 29 05:05:58 wfs kernel: [ 166.371504] [<ffffffffa0493f90>] ?
iscsit_dataout_update_r2t.part.0+0x70/0x70 [iscsi_target_mod]
Jan 29 05:05:58 wfs kernel: [ 166.371563] [<ffffffff81061e30>]
run_timer_softirq+0x1f0/0x2a0
Jan 29 05:05:58 wfs kernel: [ 166.371621] [<ffffffff81059ad0>]
__do_softirq+0xe0/0x220
Jan 29 05:05:58 wfs kernel: [ 166.371678] [<ffffffff8165e35c>]
call_softirq+0x1c/0x30
Jan 29 05:05:58 wfs kernel: [ 166.371744] [<ffffffff810045f5>]
do_softirq+0x55/0x90
Jan 29 05:05:58 wfs kernel: [ 166.371801] [<ffffffff81059d85>]
irq_exit+0xa5/0xb0
Jan 29 05:05:58 wfs kernel: [ 166.371858] [<ffffffff8165eaaa>]
smp_apic_timer_interrupt+0x4a/0x60
Jan 29 05:05:58 wfs kernel: [ 166.371914] [<ffffffff8165d88a>]
apic_timer_interrupt+0x6a/0x70
Jan 29 05:05:58 wfs kernel: [ 166.371968] <EOI> [<ffffffff8100afac>] ?
default_idle+0x1c/0xb0
Jan 29 05:05:58 wfs kernel: [ 166.372060] [<ffffffff8100b7be>]
arch_cpu_idle+0x1e/0x30
Jan 29 05:05:58 wfs kernel: [ 166.372127] [<ffffffff810a6cd1>]
cpu_startup_entry+0xe1/0x260
Jan 29 05:05:58 wfs kernel: [ 166.372191] [<ffffffff81644b47>]
rest_init+0x77/0x80
Jan 29 05:05:58 wfs kernel: [ 166.372253] [<ffffffff81d0ae8e>]
start_kernel+0x3ee/0x3fb
Jan 29 05:05:58 wfs kernel: [ 166.372310] [<ffffffff81d0a89f>] ?
repair_env_string+0x5e/0x5e
Jan 29 05:05:58 wfs kernel: [ 166.372378] [<ffffffff81d0a5a5>]
x86_64_start_reservations+0x2a/0x2c
Jan 29 05:05:58 wfs kernel: [ 166.372464] [<ffffffff81d0a69f>]
x86_64_start_kernel+0xf8/0xfc
Jan 29 05:05:59 wfs sh[416]: abrt-dump-oops: Found oopses: 1
Jan 29 05:05:59 wfs sh[416]: abrt-dump-oops: Creating problem directories
Jan 29 05:06:00 wfs abrt-server[1342]: Looking for kernel package
Jan 29 05:06:00 wfs abrt-dump-oops: Reported 1 kernel oopses to Abrt
Jan 29 05:06:00 wfs abrt-server[1342]: Can't find kernel package
corresponding to '3.12.6'
Jan 29 05:06:00 wfs abrt-server[1342]: Can't record package version data
(pkg_version, pkg_release, ...).
Jan 29 05:06:00 wfs abrt-server[1342]: 'post-create' on
'/var/tmp/abrt/oops-2014-01-29-05:05:59-1341-0' exited with 1
Jan 29 05:06:00 wfs abrt-server[1342]: Deleting problem directory
'/var/tmp/abrt/oops-2014-01-29-05:05:59-1341-0'
^C
[root@wfs ~]#
3)dmesg
[ 489.936888] Debug #1 begrun: 0 runlength: 16777216 cmd->r2t_sn: 1
[ 489.937479] Command ITT: 0xa3a3a3a3 received R2T SNACK with BegRun:
0x00000000, RunLength: 0x01000000, exceeds current R2TSN: 0x00000001,
protocol error.
[ 489.938421] CPU: 0 PID: 1112 Comm: iscsi_trx Tainted: GF W O
3.12.6 #1
[ 489.938423] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 489.938425] ffff88003c83fef8 ffff88003c7dbd08 ffffffff8164dc00
ffff88003c83fec8
[ 489.938428] ffff88003c7af800 ffff88003c7dbd68 ffffffffa04a888d
ffff88003c7dbd48
[ 489.938429] ffffffffa04a2ae7 08ff88003c7dbd58 0000000000000000
ffff88003c85e480
[ 489.938431] Call Trace:
[ 489.938439] [<ffffffff8164dc00>] dump_stack+0x46/0x58
[ 489.938452] [<ffffffffa04a888d>] iscsit_close_connection+0x64d/0x6a0
[iscsi_target_mod]
[ 489.938457] [<ffffffffa04a2ae7>] ?
iscsit_add_reject_from_cmd+0xa7/0x140 [iscsi_target_mod]
[ 489.938462] [<ffffffffa04978a8>]
iscsit_connection_recovery_transport_reset+0x18/0x20 [iscsi_target_mod]
[ 489.938467] [<ffffffffa04950bc>]
iscsit_take_action_for_connection_exit+0xbc/0x110 [iscsi_target_mod]
[ 489.938472] [<ffffffffa04a59c5>] iscsi_target_rx_thread+0x1f5/0xf60
[iscsi_target_mod]
[ 489.938475] [<ffffffff8108a9cf>] ? __dequeue_entity+0x2f/0x50
[ 489.938477] [<ffffffff8108c180>] ? set_next_entity+0x80/0x90
[ 489.938481] [<ffffffff816525ce>] ? __schedule+0x3ce/0x800
[ 489.938485] [<ffffffffa04a57d0>] ?
iscsi_target_tx_thread+0x230/0x230 [iscsi_target_mod]
[ 489.938488] [<ffffffff81076e80>] kthread+0xc0/0xd0
[ 489.938490] [<ffffffff81076dc0>] ? kthread_create_on_node+0x120/0x120
[ 489.938492] [<ffffffff8165cc2c>] ret_from_fork+0x7c/0xb0
[ 489.938493] [<ffffffff81076dc0>] ? kthread_create_on_node+0x120/0x120
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html