Hi Nicholas, We upgraded Ubuntu 13.10 to the latest 14.04 which has a 3.13.0-24-generic kernel. I reproduced the bug with v3.13. V3.14 doesn't compile on the kernel. The tfc_io.c is the same in v3.14. The initiator I am using is open-fcoe-3.11. root@poc2:~# modinfo fcoe filename: /lib/modules/3.13.0-24-generic/kernel/drivers/scsi/fcoe/fcoe.ko license: GPL v2 description: FCoE author: Open-FCoE.org srcversion: 6DA44562FC66B71637941E8 depends: libfcoe,libfc,scsi_transport_fc intree: Y vermagic: 3.13.0-24-generic SMP mod_unload modversions signer: Magrathea: Glacier signing key sig_key: 00:A5:A6:57:59:DE:47:4B:C5:C4:31:20:88:0C:1B:94:A5:39:F4:31 sig_hashalgo: sha512 parm: ddp_min:Minimum I/O size in bytes for Direct Data Placement (DDP). (uint) parm: debug_logging:a bit mask of logging levels (int) root@poc2:~# fcoeadm -v 1.0.29 By issuing "echo eth2 > /sys/module/libfcoe/parameters/create_vn2vn", the initiator can see the target drives. I always run fio with 4KB sequential read on all the drives to reproduce the bug. Here is the crash dump with your debug information: [ 2883.272203] TARGET_CORE[fc]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. [ 3473.092860] CDB: 0x28 data_length: 4096 t_data_sg: (null) t_data_nents: 0se_cmd_flags: 0x00000109 [ 3473.092887] ------------[ cut here ]------------ [ 3473.092958] kernel BUG at /home/zb/target3.13/target/tcm_fc/tfc_io.c:100! [ 3473.093056] invalid opcode: 0000 [#1] SMP [ 3473.093123] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx tcm_loop(OF) tcm_fc(OF) iscsi_target_mod(OF) target_core_pscsi(OF) target_core_file(OF) target_core_iblock(OF) target_core_mod(OF) configfs 8021q garp stp mrp llc fcoe libfcoe libfc scsi_transport_fc scsi_tgt ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi radeon ttm drm_kms_helper drm gpio_ich intel_powerclamp coretemp kvm_intel kvm ioatdma psmouse serio_raw lpc_ich i7core_edac edac_core shpchp mac_hid lp parport hid_generic ses enclosure pata_acpi igb ixgbe usbhid i2c_algo_bit hid dca pata_jmicron ptp mdio aacraid pps_core [ 3473.094223] CPU: 9 PID: 183 Comm: kworker/9:1 Tainted: GF O 3.13.0-24-generic #46-Ubuntu [ 3473.094347] Hardware name: Supermicro X8DTN/X8DTN, BIOS 2.1c 10/28/2011 [ 3473.094462] Workqueue: target_completion target_complete_ok_work [target_core_mod] [ 3473.094573] task: ffff88061b82dfc0 ti: ffff88061b966000 task.ti: ffff88061b966000 [ 3473.094678] RIP: 0010:[<ffffffffa056905b>] [<ffffffffa056905b>] ft_queue_data_in+0x57b/0x580 [tcm_fc] [ 3473.094817] RSP: 0018:ffff88061b967d78 EFLAGS: 00010286 [ 3473.094892] RAX: 000000000000005f RBX: ffff88060a5b6790 RCX: 0000000000000000 [ 3473.094990] RDX: ffff880627caffe0 RSI: ffff880627cae3c8 RDI: 0000000000000246 [ 3473.095090] RBP: ffff88061b967df8 R08: 0000000000000092 R09: 00000000000006ef [ 3473.095189] R10: 0000000000000000 R11: ffff88061b967aa6 R12: ffff88060a5b6790 [ 3473.095288] R13: ffff88060a5b68d8 R14: 0000000000001000 R15: 0000000000000240 [ 3473.095388] FS: 0000000000000000(0000) GS:ffff880627ca0000(0000) knlGS:0000000000000000 [ 3473.095502] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 3473.095581] CR2: 00007f8588b45000 CR3: 0000000001c0e000 CR4: 00000000000007e0 [ 3473.095680] Stack: [ 3473.095710] ffff88061b82dfc0 0000000000000000 ffff88060a5b68b0 ffff88061bae9bdc [ 3473.095831] 0000000000000000 0000000000000240 ffff88061b967db8 ffff8806111a76e8 [ 3473.095951] ffff88061b967de0 ffff88060a5b6790 ffff88061b967df8 ffff88060a5b68d8 [ 3473.096070] Call Trace: [ 3473.096114] [<ffffffffa052112c>] target_complete_ok_work+0x16c/0x2d0 [target_core_mod] [ 3473.096230] [<ffffffff810838a2>] process_one_work+0x182/0x450 [ 3473.096315] [<ffffffff81084641>] worker_thread+0x121/0x410 [ 3473.096393] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 [ 3473.096478] [<ffffffff8108b312>] kthread+0xd2/0xf0 [ 3473.096546] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 3473.096641] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 [ 3473.096718] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 3473.096810] Code: 0f 0b 48 8b 5d c8 31 c9 48 c7 c7 40 a7 56 a0 48 8b 83 e8 00 00 00 44 8b 4b 20 44 8b 83 78 01 00 00 0f b6 30 31 c0 e8 c4 66 1a e1 <0f> 0b 0f 1f 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb [ 3473.097355] RIP [<ffffffffa056905b>] ft_queue_data_in+0x57b/0x580 [tcm_fc] [ 3473.097457] RSP <ffff88061b967d78> crash> crash> bt PID: 183 TASK: ffff88061b82dfc0 CPU: 9 COMMAND: "kworker/9:1" #0 [ffff88061b967a58] machine_kexec at ffffffff8104a732 #1 [ffff88061b967aa8] crash_kexec at ffffffff810e6ab3 #2 [ffff88061b967b70] oops_end at ffffffff8171ef68 #3 [ffff88061b967b98] die at ffffffff810171cb #4 [ffff88061b967bc8] do_trap at ffffffff8171e660 #5 [ffff88061b967c18] do_invalid_op at ffffffff81014512 #6 [ffff88061b967cc0] invalid_op at ffffffff81727c5e [exception RIP: ft_queue_data_in+1403] RIP: ffffffffa056905b RSP: ffff88061b967d78 RFLAGS: 00010286 RAX: 000000000000005f RBX: ffff88060a5b6790 RCX: 0000000000000000 RDX: ffff880627caffe0 RSI: ffff880627cae3c8 RDI: 0000000000000246 RBP: ffff88061b967df8 R8: 0000000000000092 R9: 00000000000006ef R10: 0000000000000000 R11: ffff88061b967aa6 R12: ffff88060a5b6790 R13: ffff88060a5b68d8 R14: 0000000000001000 R15: 0000000000000240 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88061b967d70] ft_queue_data_in at ffffffffa056905b [tcm_fc] #8 [ffff88061b967e00] target_complete_ok_work at ffffffffa052112c [target_core_ #9 [ffff88061b967e28] process_one_work at ffffffff810838a2 #10 [ffff88061b967e70] worker_thread at ffffffff81084641 #11 [ffff88061b967ed0] kthread at ffffffff8108b312 #12 [ffff88061b967f50] ret_from_fork at ffffffff8172637c Thanks, Jun On Fri, Apr 25, 2014 at 3:29 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Fri, 2014-04-25 at 10:43 -0700, Jun Wu wrote: >> Hi Nicholas, >> >> Sorry to respond to you late. I have collected the information you want. >> >> Kernel version: >> root@poc1:~# uname -a >> Linux poc1 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC >> 2014 x86_64 x86_64 x86_64 GNU/Linux >> >> NIC: >> root@poc1:~# lspci | grep 82599 >> 08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit >> SFI/SFP+ Network Connection (rev 01) >> > > Thanks for the additional info. Please also provide the specifics of > the FCoE initiator setup as well. > >> Backstores: >> Here is the targetcli output of the target machine. It has 6 hard >> drives exported to 2 initiators. >> /> ls >> o- / ..................................................................... [...] >> o- backstores .......................................................... [...] >> | o- fileio ............................................... [0 Storage Object] >> | o- iblock .............................................. [6 Storage Objects] >> | | o- diskb ............................................ [/dev/sdb activated] >> | | o- diskc ............................................ [/dev/sdc activated] >> | | o- diskd ............................................ [/dev/sdd activated] >> | | o- diske ............................................ [/dev/sde activated] >> | | o- diskf ............................................ [/dev/sdf activated] >> | | o- diskg ............................................ [/dev/sdg activated] >> | o- pscsi ................................................ [0 Storage Object] >> | o- rd_dr ................................................ [0 Storage Object] >> | o- rd_mcp ............................................... [0 Storage Object] >> o- ib_srpt ........................................................ [0 Target] >> o- iscsi .......................................................... [0 Target] >> o- loopback ....................................................... [0 Target] >> o- qla2xxx ........................................................ [0 Target] >> o- tcm_fc ......................................................... [1 Target] >> o- 20:00:00:25:90:ef:03:ec ....................................... [enabled] >> o- acls ......................................................... [2 ACLs] >> | o- 20:00:00:25:90:ef:06:1e ............................. [6 Mapped LUNs] >> | | o- mapped_lun0 ........................................... [lun0 (rw)] >> | | o- mapped_lun1 ........................................... [lun1 (rw)] >> | | o- mapped_lun2 ........................................... [lun2 (rw)] >> | | o- mapped_lun3 ........................................... [lun3 (rw)] >> | | o- mapped_lun4 ........................................... [lun4 (rw)] >> | | o- mapped_lun5 ........................................... [lun5 (rw)] >> | o- 20:00:00:25:90:ef:06:2a ............................. [6 Mapped LUNs] >> | o- mapped_lun0 ........................................... [lun0 (rw)] >> | o- mapped_lun1 ........................................... [lun1 (rw)] >> | o- mapped_lun2 ........................................... [lun2 (rw)] >> | o- mapped_lun3 ........................................... [lun3 (rw)] >> | o- mapped_lun4 ........................................... [lun4 (rw)] >> | o- mapped_lun5 ........................................... [lun5 (rw)] >> o- luns ......................................................... [6 LUNs] >> o- lun0 ...................................... [iblock/diskc (/dev/sdc)] >> o- lun1 ...................................... [iblock/diskd (/dev/sdd)] >> o- lun2 ...................................... [iblock/diske (/dev/sde)] >> o- lun3 ...................................... [iblock/diskf (/dev/sdf)] >> o- lun4 ...................................... [iblock/diskg (/dev/sdg)] >> o- lun5 ...................................... [iblock/diskb (/dev/sdb)] >> >> By compiling tcm_fc, we found the RIP (ft_queue_data_in+1386) points >> to tfc_io.c:94. >> 91 /* >> 92 * Setup to use first mem list entry, unless no data. >> 93 */ >> 94 BUG_ON(remaining && !se_cmd->t_data_sg); >> 95 if (remaining) { >> 96 sg = se_cmd->t_data_sg; >> 97 mem_len = sg->length; >> 98 mem_off = sg->offset; >> 99 page = sg_page(sg); >> 100 } >> >> That is BUG_ON(remaining && !se_cmd->t_data_sg). >> > > So let's find out a little more about the CDB that is triggering the > bug. > > Please apply the following patch to your v3.11 tree to dump the se_cmd > in question when the bug is triggered in ft_queue_data_in(): > > diff --git a/drivers/target/tcm_fc/tfc_io.c b/drivers/target/tcm_fc/tfc_io.c > index e415af3..8009407 100644 > --- a/drivers/target/tcm_fc/tfc_io.c > +++ b/drivers/target/tcm_fc/tfc_io.c > @@ -91,7 +91,13 @@ int ft_queue_data_in(struct se_cmd *se_cmd) > /* > * Setup to use first mem list entry, unless no data. > */ > - BUG_ON(remaining && !se_cmd->t_data_sg); > + if (remaining && !se_cmd->t_data_sg) { > + printk("CDB: 0x%02x data_length: %u t_data_sg: %p t_data_nents: %u" > + "se_cmd_flags: 0x%08x\n", se_cmd->t_task_cdb[0], > + se_cmd->data_length, se_cmd->t_data_sg, > + se_cmd->t_data_nents, se_cmd->se_cmd_flags); > + BUG(); > + } > if (remaining) { > sg = se_cmd->t_data_sg; > mem_len = sg->length; > > >> root@poc1:~# modinfo tcm_fc >> filename: >> /lib/modules/3.11.0-18-generic/kernel/drivers/target/tcm_fc/tcm_fc.ko >> license: GPL >> description: FC TCM fabric driver 0.4 >> srcversion: 68B468A9E0DB43CC9653984 >> depends: target_core_mod,libfc >> vermagic: 3.11.0-18-generic SMP mod_unload modversions >> parm: debug_logging:a bit mask of logging levels (int) >> >> On the 2 initiators, run fio to all the 6 hard drives on the target at >> the same time. The target crashes within a few seconds every time at >> the same RIP. >> > > So I don't see any tcm_fc specific changes in v3.11 code that would be > causing such a bug, nor any v3.11.y bugfixes in this area that would > apply. > > Also since the bug is easy to reproduce with multiple initiators, it > might be worthwhile to try to reproduce with v3.14.y as well. > > Thanks, > > --nab > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html