On the initiator side, I run fio and see the following messages in dmesg: [ 3294.893951] BUG: soft lockup - CPU#2 stuck for 22s! [systemd-udevd:4665] [ 3294.895491] Modules linked in: target_core_pscsi target_core_file target_core_iblock ipmi_devintf ipmi_si ipmi_msghandler ib_srpt tcm_qla2xxx qla2xxx tcm_loop tcm_fc iscsi_target_mod target_core_mod configfs 8021q garp stp mrp llc fcoe libfcoe libfc scsi_transport_fc scsi_tgt ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi radeon ttm drm_kms_helper drm intel_powerclamp coretemp kvm_intel kvm gpio_ich microcode psmouse serio_raw lpc_ich ioatdma i7core_edac edac_core shpchp mac_hid lp parport ext2 ses enclosure pata_acpi hid_generic igb ixgbe usbhid i2c_algo_bit dca hid pata_jmicron ptp aacraid mdio pps_core [ 3294.895533] CPU: 2 PID: 4665 Comm: systemd-udevd Tainted: GF W 3.11.0-18-generic #32-Ubuntu [ 3294.895534] Hardware name: Supermicro X8DTN/X8DTN, BIOS 2.1c 10/28/2011 [ 3294.895536] task: ffff880628592ee0 ti: ffff88062a52e000 task.ti: ffff88062a52e000 [ 3294.895538] RIP: 0010:[<ffffffff810c64ae>] [<ffffffff810c64ae>] smp_call_function_many+0x26e/0x2d0 [ 3294.895542] RSP: 0018:ffff88062a52fad8 EFLAGS: 00000202 [ 3294.895544] RAX: 0000000000000007 RBX: ffffffff81d04dc0 RCX: ffff88063fc77fd0 [ 3294.895545] RDX: 0000000000000007 RSI: 0000000000000100 RDI: 0000000000000000 [ 3294.895547] RBP: ffff88062a52fb28 R08: ffff880333c550c8 R09: 0000000000000004 [ 3294.895549] R10: ffff880333c550c8 R11: 0000000000000005 R12: ffff88062a52fb10 [ 3294.895550] R13: 0000000000000282 R14: ffff88062a52fa78 R15: ffff880333c54580 [ 3294.895553] FS: 00007f2603f63880(0000) GS:ffff880333c40000(0000) knlGS:0000000000000000 [ 3294.895555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3294.895556] CR2: 00007f260308b212 CR3: 0000000001c0e000 CR4: 00000000000007e0 [ 3294.895558] Stack: [ 3294.895559] ffff880333c550e8 0000000000015080 0000000000000000 ffffffff811d8450 [ 3294.895564] 0000010000000001 ffff88062a52fb78 ffffffff811d8450 0000000000000000 [ 3294.895567] 0000000000000002 0000000000000100 ffff88062a52fb58 ffffffff810c659a [ 3294.895571] Call Trace: [ 3294.895575] [<ffffffff811d8450>] ? __brelse+0x40/0x40 [ 3294.895579] [<ffffffff811d8450>] ? __brelse+0x40/0x40 [ 3294.895582] [<ffffffff810c659a>] on_each_cpu_mask+0x2a/0x60 [ 3294.895585] [<ffffffff811d7690>] ? mark_buffer_async_write+0x20/0x20 [ 3294.895588] [<ffffffff810c6684>] on_each_cpu_cond+0xb4/0xe0 [ 3294.895591] [<ffffffff811d8450>] ? __brelse+0x40/0x40 [ 3294.895594] [<ffffffff811d8009>] invalidate_bh_lrus+0x29/0x30 [ 3294.895597] [<ffffffff811dec0e>] kill_bdev+0x1e/0x30 [ 3294.895600] [<ffffffff811e0206>] __blkdev_put+0x66/0x1b0 [ 3294.895603] [<ffffffff811e0c6e>] blkdev_put+0x4e/0x140 [ 3294.895606] [<ffffffff811e0e15>] blkdev_close+0x25/0x30 [ 3294.895610] [<ffffffff811a9821>] __fput+0xe1/0x230 [ 3294.895613] [<ffffffff811a99be>] ____fput+0xe/0x10 [ 3294.895616] [<ffffffff81081554>] task_work_run+0xc4/0xe0 [ 3294.895620] [<ffffffff810642b7>] do_exit+0x2b7/0xa40 [ 3294.895623] [<ffffffff811c1991>] ? touch_atime+0x71/0x140 [ 3294.895627] [<ffffffff81141098>] ? generic_file_aio_read+0x588/0x700 [ 3294.895630] [<ffffffff81064abf>] do_group_exit+0x3f/0xa0 [ 3294.895633] [<ffffffff810743c0>] get_signal_to_deliver+0x1d0/0x5e0 [ 3294.895636] [<ffffffff811df6dc>] ? blkdev_aio_read+0x4c/0x70 [ 3294.895640] [<ffffffff81012438>] do_signal+0x48/0x960 [ 3294.895644] [<ffffffff81012dc8>] do_notify_resume+0x78/0xa0 [ 3294.895647] [<ffffffff816f86da>] int_signal+0x12/0x17 [ 3294.895649] Code: 3b 05 bf fa c3 00 89 c2 0f 8d 20 fe ff ff 48 98 49 8b 4d 00 48 03 0c c5 80 40 d0 81 f6 41 20 01 74 cb 0f 1f 00 f3 90 f6 41 20 01 <75> f8 eb be 0f b6 4d d0 48 8b 55 c0 89 df 48 8b 75 c8 e8 fb fa On Tue, Apr 15, 2014 at 9:03 AM, Jun Wu <jwu@xxxxxxxxxxxx> wrote: > Hello, > > We are working on a cluster file system using fcoe vn2vn. Multiple > initiators can see the same set of target hard drives exported by targetcli > tcm_fc. When the initiators run IO to these target hard drives at the same > time, target system crashes no matter using iblock backstore or pscsi > backstore. See the following dump. > > crash> bt > PID: 318 TASK: ffff880c1a05aee0 CPU: 5 COMMAND: "kworker/5:1" > #0 [ffff880c1a895a48] machine_kexec at ffffffff810485e2 > #1 [ffff880c1a895a98] crash_kexec at ffffffff810d09d3 > #2 [ffff880c1a895b60] oops_end at ffffffff816f0c98 > #3 [ffff880c1a895b88] die at ffffffff8101616b > #4 [ffff880c1a895bb8] do_trap at ffffffff816f04b0 > #5 [ffff880c1a895c08] do_invalid_op at ffffffff810134a8 > #6 [ffff880c1a895cb0] invalid_op at ffffffff816f9c1e > [exception RIP: ft_queue_data_in+1386] > RIP: ffffffffa0641eda RSP: ffff880c1a895d68 RFLAGS: 00010246 > RAX: 0000000000001000 RBX: ffff880c17a6dc10 RCX: 0000000000000002 > RDX: 0000000000000000 RSI: ffff880c1afa36d8 RDI: 0000000000000000 > RBP: ffff880c1a895df8 R8: ffff880c1667e45c R9: dfcf2970a166dd90 > R10: dfcf2970a166dd90 R11: 0000000000000000 R12: ffff880c17a6dc10 > R13: ffff880c3fc33e00 R14: 0000000000001000 R15: 0000000000000140 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffff880c1a895d60] ft_queue_data_in at ffffffffa06419c7 [tcm_fc] > #8 [ffff880c1a895e00] target_complete_ok_work at ffffffffa04ded21 > [target_core_ > mod] > #9 [ffff880c1a895e28] process_one_work at ffffffff8107d0ec > #10 [ffff880c1a895e70] worker_thread at ffffffff8107dd3c > #11 [ffff880c1a895ed0] kthread at ffffffff810848d0 > #12 [ffff880c1a895f50] ret_from_fork at ffffffff816f836c > > Is there any way to avoid this problem? > Thanks, > > Jun -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html