I took a look at this and it seemed like it might be ok, so I ran a kernel from target-pending/for-next. I ran into an ixgbe issue that I recognized, so I applied a patch to fix that. Even with that fix, when running maim with 32kb writes, I would get a small burst of these: [ 324.028609] ft_recv_write_data: xid 0x171, f_ctl 0x8, cmd->sg ffff88004eceb000, cmd->sg_cnt 0x80. DDP was setup hence not expected to receive frame with payload, Frame will be dropped if'Sequence Initiative' bit in f_ctl isnot set which indicates another ixgbe problem. Note that this message did not result in any I/O errors or anything, so I think it can safely be ignored except to note that it is a sign of other ixgbe issues present in this kernel. When I ran a kernel with lots of debugging turned on, I had one instance when maim had a failed I/O. I suspect that it may have been due to a packet loss, suggesting that somehow PFC may not have been set up right. There is such a pile of ixgbe changes in flight that almost anything is possible. Consistent with the ixgbe observation above. When I ran a non-debugging kernel, I did not see the I/O failure, probably because the kernel could keep up better, but when my maim test got up to the pass that does 1MB reads, I got the following: [ 625.841786] BUG: unable to handle kernel paging request at ffffffffffffffe8 [ 625.842717] IP: [<ffffffffa0131a14>] ft_get_task_tag+0x4/0x10 [tcm_fc] [ 625.842717] PGD 1541067 PUD 1542067 PMD 0 [ 625.842717] Oops: 0000 [#1] SMP [ 625.842717] CPU 0 [ 625.842717] Modules linked in: sd_mod crc_t10dif tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod fcoe libfcoe 8021q libfc scsi_transport_fc scsi_tgt garp stp llc ixgbe mdio autofs4 [last unloaded: scsi_wait_scan] [ 625.842717] [ 625.842717] Pid: 26, comm: kworker/u:1 Not tainted 3.5.0-rc2-mdrtarget-pending+ #5 Bochs Bochs [ 625.842717] RIP: 0010:[<ffffffffa0131a14>] [<ffffffffa0131a14>] ft_get_task_tag+0x4/0x10 [tcm_fc] [ 625.842717] RSP: 0018:ffff88007c77bd88 EFLAGS: 00010246 [ 625.842717] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88007c77bdd8 [ 625.842717] RDX: ffff8800770061e8 RSI: dead000000100100 RDI: ffff880077006010 [ 625.842717] RBP: ffff880077006010 R08: ffff88007ac31100 R09: ffff88007ac31100 [ 625.842717] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c77bdd8 [ 625.842717] R13: ffff88007ac31000 R14: ffff880064acd010 R15: 0000000000000001 [ 625.842717] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 625.842717] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 625.842717] CR2: ffffffffffffffe8 CR3: 000000007a071000 CR4: 00000000000006f0 [ 625.842717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 625.842717] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 625.842717] Process kworker/u:1 (pid: 26, threadinfo ffff88007c77a000, task ffff88007c7532d0) [ 625.842717] Stack: [ 625.842717] ffffffffa00b35db 000001992f68fbd5 0000000000000000 ffff8800770061e8 [ 625.842717] ffff88007ac3108a 0000000000000282 0000000000000000 000000017fc11030 [ 625.842717] ffff88007ac31000 ffff88007fc10fc0 ffff88007c77bdd8 ffff88007c77bdd8 [ 625.842717] Call Trace: [ 625.842717] [<ffffffffa00b35db>] ? core_tmr_lun_reset+0x3fb/0x4d0 [target_core_mod] [ 625.842717] [<ffffffffa00b4e00>] ? transport_cmd_check_stop+0x180/0x180 [target_core_mod] [ 625.842717] [<ffffffffa00b4ea4>] ? target_tmr_work+0xa4/0xe0 [target_core_mod] [ 625.842717] [<ffffffff8104a32e>] ? process_one_work+0xfe/0x380 [ 625.842717] [<ffffffff8104afec>] ? worker_thread+0x14c/0x330 [ 625.842717] [<ffffffff8104aea0>] ? manage_workers+0x210/0x210 [ 625.842717] [<ffffffff8104eeb5>] ? kthread+0x85/0x90 [ 625.842717] [<ffffffff813f52a4>] ? kernel_thread_helper+0x4/0x10 [ 625.842717] [<ffffffff8104ee30>] ? kthread_worker_fn+0x130/0x130 [ 625.842717] [<ffffffff813f52a0>] ? gs_change+0x13/0x13 [ 625.842717] Code: f4 ff ff ff e9 4e fe ff ff be ad 00 00 00 48 c7 c7 e8 37 13 a0 e8 0d 1d f0 e0 e9 2b ff ff ff 0f 1f 84 00 00 00 00 00 48 8b 47 f8 <0f> b7 40 e8 c3 0f 1f 80 00 00 00 00 31 c0 c3 66 66 66 66 2e 0f [ 625.842717] RIP [<ffffffffa0131a14>] ft_get_task_tag+0x4/0x10 [tcm_fc] [ 625.842717] RSP <ffff88007c77bd88> [ 625.842717] CR2: ffffffffffffffe8 [ 625.842717] ---[ end trace e1b7f375916824f3 ]--- [ 625.884637] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 625.885243] IP: [<ffffffff8104f0f7>] kthread_data+0x7/0x10 [ 625.885243] PGD 1541067 PUD 1542067 PMD 0 [ 625.885243] Oops: 0000 [#2] SMP [ 625.885243] CPU 0 [ 625.885243] Modules linked in: sd_mod crc_t10dif tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod fcoe libfcoe 8021q libfc scsi_transport_fc scsi_tgt garp stp llc ixgbe mdio autofs4 [last unloaded: scsi_wait_scan] [ 625.885243] [ 625.885243] Pid: 26, comm: kworker/u:1 Tainted: G D 3.5.0-rc2-mdrtarget-pending+ #5 Bochs Bochs [ 625.885243] RIP: 0010:[<ffffffff8104f0f7>] [<ffffffff8104f0f7>] kthread_data+0x7/0x10 [ 625.885243] RSP: 0018:ffff88007c77ba60 EFLAGS: 00010002 [ 625.885243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffa47280 [ 625.885243] RDX: 0000003c7324d4cf RSI: 0000000000000000 RDI: ffff88007c7532d0 [ 625.885243] RBP: ffff88007c7532d0 R08: 0000000000000001 R09: 0000000000000001 [ 625.885243] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 625.885243] R13: ffff88007c753550 R14: 0000000000000001 R15: ffff88007c7532c0 [ 625.885243] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 625.885243] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 625.885243] CR2: fffffffffffffff8 CR3: 000000007a083000 CR4: 00000000000006f0 [ 625.885243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 625.885243] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 625.885243] Process kworker/u:1 (pid: 26, threadinfo ffff88007c77a000, task ffff88007c7532d0) [ 625.885243] Stack: [ 625.885243] ffffffff8104b408 ffff88007fc10fc0 ffffffff813ecc0b ffff88007bcd16f8 [ 625.885243] ffff8800771f6610 ffff88007c77bfd8 ffff88007c77bfd8 ffff88007c77bfd8 [ 625.885243] ffff88007c7532d0 0000000000000046 ffff88007c7532d0 ffff88007c753750 [ 625.885243] Call Trace: [ 625.885243] [<ffffffff8104b408>] ? wq_worker_sleeping+0x8/0x90 [ 625.885243] [<ffffffff813ecc0b>] ? __schedule+0x39b/0x570 [ 625.885243] [<ffffffff81037ce0>] ? do_exit+0x570/0x890 [ 625.885243] [<ffffffff813eed64>] ? oops_end+0xa4/0xf0 [ 625.885243] [<ffffffff813e70d5>] ? no_context+0x25d/0x26c [ 625.885243] [<ffffffff813e6988>] ? pmd_offset+0x16/0x1b [ 625.885243] [<ffffffff813f1676>] ? do_page_fault+0x346/0x4c0 [ 625.885243] [<ffffffff8104bc84>] ? alloc_pid+0x2c4/0x440 [ 625.885243] [<ffffffff81027972>] ? pvclock_clocksource_read+0x52/0xf0 [ 625.885243] [<ffffffff8105e32e>] ? select_task_rq_fair+0x5ce/0x7d0 [ 625.885243] [<ffffffff8105d313>] ? check_preempt_wakeup+0xe3/0x240 [ 625.885243] [<ffffffff813ee1a5>] ? async_page_fault+0x25/0x30 [ 625.885243] [<ffffffffa0131a14>] ? ft_get_task_tag+0x4/0x10 [tcm_fc] [ 625.885243] [<ffffffffa00b35db>] ? core_tmr_lun_reset+0x3fb/0x4d0 [target_core_mod] [ 625.885243] [<ffffffffa00b4e00>] ? transport_cmd_check_stop+0x180/0x180 [target_core_mod] [ 625.885243] [<ffffffffa00b4ea4>] ? target_tmr_work+0xa4/0xe0 [target_core_mod] [ 625.885243] [<ffffffff8104a32e>] ? process_one_work+0xfe/0x380 [ 625.885243] [<ffffffff8104afec>] ? worker_thread+0x14c/0x330 [ 625.885243] [<ffffffff8104aea0>] ? manage_workers+0x210/0x210 [ 625.885243] [<ffffffff8104eeb5>] ? kthread+0x85/0x90 [ 625.885243] [<ffffffff813f52a4>] ? kernel_thread_helper+0x4/0x10 [ 625.885243] [<ffffffff8104ee30>] ? kthread_worker_fn+0x130/0x130 [ 625.885243] [<ffffffff813f52a0>] ? gs_change+0x13/0x13 [ 625.885243] Code: 00 00 00 00 48 8b 5c 24 08 48 c7 c7 e0 eb 4a 81 be e2 00 00 00 48 8b 6c 24 10 48 83 c4 18 e9 21 46 fe ff 90 48 8b 87 28 02 00 00 <48> 8b 40 f8 c3 0f 1f 40 00 48 3b 3d 61 4e 67 00 74 0f 65 8b 04 [ 625.885243] RIP [<ffffffff8104f0f7>] kthread_data+0x7/0x10 [ 625.885243] RSP <ffff88007c77ba60> [ 625.885243] CR2: fffffffffffffff8 [ 625.885243] ---[ end trace e1b7f375916824f4 ]--- Resulting in a dead target machine. I haven't analyzed this any further yet, but I thought I should share what I'm seeing. Of course I don't know right now whether it is related to this particular patch or something earlier. If I can find a point where this doesn't happen, I'll try to bisect it. At the moment I have 16 targets all on ramdisks. I was going to say that performance looked slow, but I just realized that I have a simple patch that improves performance that I've been waiting to get back from our validation team. So, not a fair comparison. I've been running with it for so long I can't remember what the target ran like without it. Hmm. I may also have had a debugging kernel on my initiator, will check that too. -- Mark Rustad, LAN Access Division, Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html