Re: System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2014-05-08 at 19:17 -0700, Jun Wu wrote:
> We are running in system crashes as number of drive under test
> increases. The test configuration is one initiator as server running
> fio sessions to remote drives on a target server via fcoe vn2vn. Both
> servers running fedora 20 (kernel 3.14.2-200). Running fio sessions up
> to 7 remote drives works but target machines hangs when drive count
> increased to 8. The system crashes are very repeatable and duplicated
> on RHEL 6.5. Following are error messages on  target server:
> 
> 
> [ 1503.737314] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000048
> [ 1503.737442] IP: [<ffffffffa0610885>] ft_sess_put+0x5/0x30 [tcm_fc]
> [ 1503.737540] PGD 0
> [ 1503.737575] Oops: 0000 [#1] SMP
> [ 1503.737631] Modules linked in: tcm_fc target_core_pscsi
> target_core_file target_core_iblock iscsi_target_mod target_core_mod
> fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp mrp fuse
> ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute
> bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
> ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
> iptable_security iptable_raw coretemp iTCO_wdt kvm_intel kvm gpio_ich
> iTCO_vendor_support ses crc32c_intel tpm_tis enclosure i7core_edac
> ioatdma edac_core shpchp serio_raw tpm lpc_ich mfd_core i2c_i801
> microcode acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd sunrpc radeon
> drm_kms_helper ttm
> [ 1503.738933]  igb ixgbe drm ata_generic mdio ptp pata_acpi pps_core
> pata_jmicron i2c_algo_bit aacraid dca i2c_core
> [ 1503.739118] CPU: 5 PID: 6537 Comm: kworker/5:4 Not tainted
> 3.14.2-200.fc20.x86_64 #1
> [ 1503.739225] Hardware name: Supermicro X8DTN/X8DTN, BIOS 2.1c       10/28/2011
> [ 1503.739338] Workqueue: target_completion target_complete_ok_work
> [target_core_mod]
> [ 1503.739449] task: ffff88062071d580 ti: ffff88061a322000 task.ti:
> ffff88061a322000
> [ 1503.739553] RIP: 0010:[<ffffffffa0610885>]  [<ffffffffa0610885>]
> ft_sess_put+0x5/0x30 [tcm_fc]
> [ 1503.739681] RSP: 0018:ffff88061a323ce8  EFLAGS: 00010016
> [ 1503.739755] RAX: 0000000000000000 RBX: ffff880304a23498 RCX: 0000000000009010
> [ 1503.739853] RDX: 0000000000009010 RSI: 00000000000000cb RDI: 0000000000000000
> [ 1503.739953] RBP: ffff88061a323d08 R08: ffff88031c4c6500 R09: 000000018020000f
> [ 1503.740051] R10: ffffffff815cfe87 R11: ffffea000c713180 R12: ffff88031c4c6500
> [ 1503.740150] R13: ffff88031f7c1f80 R14: ffff88031f7c1fe8 R15: 0000000000000000
> [ 1503.740250] FS:  0000000000000000(0000) GS:ffff88063fc20000(0000)
> knlGS:0000000000000000
> [ 1503.740363] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1503.740443] CR2: 0000000000000048 CR3: 0000000001c0c000 CR4: 00000000000007e0
> [ 1503.740541] Stack:
> [ 1503.740572]  ffffffffa060e058 ffff880304a23568 ffff88031f7c1f80
> ffff880304a234a8
> [ 1503.740692]  ffff88061a323d18 ffffffffa060e5e2 ffff88061a323d40
> ffffffffa05cff42
> [ 1503.740812]  ffff880304a234a8 ffff880304a23568 0000000000000246
> ffff88061a323d70
> [ 1503.740931] Call Trace:
> [ 1503.740969]  [<ffffffffa060e058>] ? ft_free_cmd+0x58/0x60 [tcm_fc]
> [ 1503.741057]  [<ffffffffa060e5e2>] ft_release_cmd+0x12/0x20 [tcm_fc]
> [ 1503.741150]  [<ffffffffa05cff42>] target_release_cmd_kref+0x52/0x80
> [target_core_mod]
> [ 1503.741264]  [<ffffffffa05d1bd3>] transport_release_cmd+0xd3/0xf0
> [target_core_mod]
> [ 1503.741377]  [<ffffffffa05d1c28>]
> transport_generic_free_cmd+0x38/0x250 [target_core_mod]
> [ 1503.741491]  [<ffffffffa060e600>] ft_check_stop_free+0x10/0x20 [tcm_fc]
> [ 1503.741590]  [<ffffffffa05cfe32>]
> transport_cmd_check_stop+0xc2/0x140 [target_core_mod]
> [ 1503.741708]  [<ffffffffa05d3a97>]
> target_complete_ok_work+0xe7/0x2d0 [target_core_mod]
> [ 1503.741824]  [<ffffffff810a6886>] process_one_work+0x176/0x430
> [ 1503.741907]  [<ffffffff810a74db>] worker_thread+0x11b/0x3a0
> [ 1503.741985]  [<ffffffff810a73c0>] ? rescuer_thread+0x370/0x370
> [ 1503.742069]  [<ffffffff810ae211>] kthread+0xe1/0x100
> [ 1503.742138]  [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
> [ 1503.742227]  [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0
> [ 1503.746668]  [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
> [ 1503.751104] Code: 48 89 f0 48 89 c7 89 d6 48 89 e5 48 8b 49 10 48
> 89 ca e8 4f ed ff ff 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66
> 66 66 66 90 <8b> 47 48 85 c0 74 22 48 8d 47 48 f0 83 6f 48 01 74 09 c3
> 0f 1f
> [ 1503.760558] RIP  [<ffffffffa0610885>] ft_sess_put+0x5/0x30 [tcm_fc]
> [ 1503.765145]  RSP <ffff88061a323ce8>
> [ 1503.769687] CR2: 0000000000000048
> [ 1503.789003] ---[ end trace c7457ccb45bf0bc9 ]---
> 
> 
> 

The v3.14 OOPs above looks like a free-after-use regression from the
v3.13 conversion to use percpu-ida for pre-allocation of ft_cmd
descriptors.

Here's the patch that I'm applying to address this specific bug in
tcm_fc.  Please apply it and verify the fix on your end.

>From 1d8dc8a29cfa6d66e5068ab6dad3216fe218cc53 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
Date: Mon, 12 May 2014 12:18:32 -0700
Subject: [PATCH] tcm_fc: Fix free-after-use regression in ft_free_cmd

This patch fixes a free-after-use regression in ft_free_cmd(), where
percpu_ida_free() was incorrectly called to release the tag before
ft_sess_put() is called to drop the session reference.

Fix this bug by moving the percpu_ida_free() call after ft_free_cmd().

The regression was originally introduced in v3.13-rc1 commit:

  commit 5f544cfac956971099e906f94568bc3fd1a7108a
  Author: Nicholas Bellinger <nab@xxxxxxxxxxxxx>
  Date:   Mon Sep 23 12:12:42 2013 -0700

      tcm_fc: Convert to per-cpu command map pre-allocation of ft_cmd

Reported-by: Jun Wu <jwu@xxxxxxxxxxxx>
Cc: Mark Rustad <mark.d.rustad@xxxxxxxxx>
Cc: Robert Love <robert.w.love@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> #3.13+
Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
---
 drivers/target/tcm_fc/tfc_cmd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 01cf37f..28fce39 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -100,8 +100,8 @@ static void ft_free_cmd(struct ft_cmd *cmd)
 	if (fr_seq(fp))
 		lport->tt.seq_release(fr_seq(fp));
 	fc_frame_free(fp);
-	percpu_ida_free(&se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
 	ft_sess_put(cmd->sess);	/* undo get from lookup at recv */
+	percpu_ida_free(&se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
 }
 
 void ft_release_cmd(struct se_cmd *se_cmd)
-- 
1.7.10.4

> Before target hangs, a lot of messages as follows are printed out on
> the initiator:
> 
> fio: io_u error on file /dev/sdl: Input/output error
>      read offset=1030152192, buflen=4096
> 
> [ 3787.971900] sd 8:0:0:0: [sdl] Unhandled error code
> [ 3787.971907] sd 8:0:0:0: [sdl]
> [ 3787.971910] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
> [ 3787.971913] sd 8:0:0:0: [sdl] CDB:
> [ 3787.971915] Read(10): 28 00 00 1e b3 70 00 00 08 00
> [ 3787.971924] end_request: I/O error, dev sdl, sector 2012016
> 
> 

Not sure what's going on here without more information.  

> Installation steps used:
> yum install lldpad
> yum install fcoe-utils
> modprobe fcoe
> yum install targetcli
> 
> Before these tests, we also installed Redhat 6.5 and followed
> instructions on https://www.open-fcoe.org/. On Redhat, I were only
> able to run fio to 3 target drives. Using 4 target drives crashed the
> target machine.
> 

No idea without more info wrt RHEL 6.5, but it certainly doesn't have
the v3.13+ specific percpu-ida regression from above.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux