Hi Nicholas, Sorry to respond to you late. I have collected the information you want. Kernel version: root@poc1:~# uname -a Linux poc1 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux NIC: root@poc1:~# lspci | grep 82599 08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) Backstores: Here is the targetcli output of the target machine. It has 6 hard drives exported to 2 initiators. /> ls o- / ..................................................................... [...] o- backstores .......................................................... [...] | o- fileio ............................................... [0 Storage Object] | o- iblock .............................................. [6 Storage Objects] | | o- diskb ............................................ [/dev/sdb activated] | | o- diskc ............................................ [/dev/sdc activated] | | o- diskd ............................................ [/dev/sdd activated] | | o- diske ............................................ [/dev/sde activated] | | o- diskf ............................................ [/dev/sdf activated] | | o- diskg ............................................ [/dev/sdg activated] | o- pscsi ................................................ [0 Storage Object] | o- rd_dr ................................................ [0 Storage Object] | o- rd_mcp ............................................... [0 Storage Object] o- ib_srpt ........................................................ [0 Target] o- iscsi .......................................................... [0 Target] o- loopback ....................................................... [0 Target] o- qla2xxx ........................................................ [0 Target] o- tcm_fc ......................................................... [1 Target] o- 20:00:00:25:90:ef:03:ec ....................................... [enabled] o- acls ......................................................... [2 ACLs] | o- 20:00:00:25:90:ef:06:1e ............................. [6 Mapped LUNs] | | o- mapped_lun0 ........................................... [lun0 (rw)] | | o- mapped_lun1 ........................................... [lun1 (rw)] | | o- mapped_lun2 ........................................... [lun2 (rw)] | | o- mapped_lun3 ........................................... [lun3 (rw)] | | o- mapped_lun4 ........................................... [lun4 (rw)] | | o- mapped_lun5 ........................................... [lun5 (rw)] | o- 20:00:00:25:90:ef:06:2a ............................. [6 Mapped LUNs] | o- mapped_lun0 ........................................... [lun0 (rw)] | o- mapped_lun1 ........................................... [lun1 (rw)] | o- mapped_lun2 ........................................... [lun2 (rw)] | o- mapped_lun3 ........................................... [lun3 (rw)] | o- mapped_lun4 ........................................... [lun4 (rw)] | o- mapped_lun5 ........................................... [lun5 (rw)] o- luns ......................................................... [6 LUNs] o- lun0 ...................................... [iblock/diskc (/dev/sdc)] o- lun1 ...................................... [iblock/diskd (/dev/sdd)] o- lun2 ...................................... [iblock/diske (/dev/sde)] o- lun3 ...................................... [iblock/diskf (/dev/sdf)] o- lun4 ...................................... [iblock/diskg (/dev/sdg)] o- lun5 ...................................... [iblock/diskb (/dev/sdb)] By compiling tcm_fc, we found the RIP (ft_queue_data_in+1386) points to tfc_io.c:94. 91 /* 92 * Setup to use first mem list entry, unless no data. 93 */ 94 BUG_ON(remaining && !se_cmd->t_data_sg); 95 if (remaining) { 96 sg = se_cmd->t_data_sg; 97 mem_len = sg->length; 98 mem_off = sg->offset; 99 page = sg_page(sg); 100 } That is BUG_ON(remaining && !se_cmd->t_data_sg). root@poc1:~# modinfo tcm_fc filename: /lib/modules/3.11.0-18-generic/kernel/drivers/target/tcm_fc/tcm_fc.ko license: GPL description: FC TCM fabric driver 0.4 srcversion: 68B468A9E0DB43CC9653984 depends: target_core_mod,libfc vermagic: 3.11.0-18-generic SMP mod_unload modversions parm: debug_logging:a bit mask of logging levels (int) On the 2 initiators, run fio to all the 6 hard drives on the target at the same time. The target crashes within a few seconds every time at the same RIP. Thanks, Jun On Thu, Apr 17, 2014 at 5:03 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > Hi Jun, > > On Tue, 2014-04-15 at 09:15 -0700, Jun Wu wrote: >> Hello, >> >> We are working on a cluster file system using fcoe vn2vn. Multiple >> initiators can see the same set of target hard drives exported by >> targetcli tcm_fc. When the initiators run IO to these target hard >> drives at the same time, target system crashes no matter using iblock >> backstore or pscsi backstore. See the following dump. >> >> crash> bt >> PID: 318 TASK: ffff880c1a05aee0 CPU: 5 COMMAND: "kworker/5:1" >> #0 [ffff880c1a895a48] machine_kexec at ffffffff810485e2 >> #1 [ffff880c1a895a98] crash_kexec at ffffffff810d09d3 >> #2 [ffff880c1a895b60] oops_end at ffffffff816f0c98 >> #3 [ffff880c1a895b88] die at ffffffff8101616b >> #4 [ffff880c1a895bb8] do_trap at ffffffff816f04b0 >> #5 [ffff880c1a895c08] do_invalid_op at ffffffff810134a8 >> #6 [ffff880c1a895cb0] invalid_op at ffffffff816f9c1e >> [exception RIP: ft_queue_data_in+1386] >> RIP: ffffffffa0641eda RSP: ffff880c1a895d68 RFLAGS: 00010246 >> RAX: 0000000000001000 RBX: ffff880c17a6dc10 RCX: 0000000000000002 >> RDX: 0000000000000000 RSI: ffff880c1afa36d8 RDI: 0000000000000000 >> RBP: ffff880c1a895df8 R8: ffff880c1667e45c R9: dfcf2970a166dd90 >> R10: dfcf2970a166dd90 R11: 0000000000000000 R12: ffff880c17a6dc10 >> R13: ffff880c3fc33e00 R14: 0000000000001000 R15: 0000000000000140 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> #7 [ffff880c1a895d60] ft_queue_data_in at ffffffffa06419c7 [tcm_fc] >> #8 [ffff880c1a895e00] target_complete_ok_work at ffffffffa04ded21 >> [target_core_ >> >> mod] >> #9 [ffff880c1a895e28] process_one_work at ffffffff8107d0ec >> #10 [ffff880c1a895e70] worker_thread at ffffffff8107dd3c >> #11 [ffff880c1a895ed0] kthread at ffffffff810848d0 >> #12 [ffff880c1a895f50] ret_from_fork at ffffffff816f836c >> >> Is there any way to avoid this problem? > > Can you be a bit more specific on the setup..? Eg: kernel version on > the target, NICs, backstores, etcs. > > Also, it might be useful if you can run the RIP (ft_queue_data_in+1386) > through gdb with your kernel source to see where the bug is actually > pointing. > > (Also, CC'ing some of the Intel FCoE folks) > > --nab > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html