Re: [PATCH 08/12] tcm_fc: use target_execute_cmd

"Rustad, Mark D" <mark.d.rustad@xxxxxxxxx> · Fri, 13 Jul 2012 20:50:13 +0000

I took a look at this and it seemed like it might be ok, so I ran a kernel from target-pending/for-next. I ran into an ixgbe issue that I recognized, so I applied a patch to fix that. Even with that fix, when running maim with 32kb writes, I would get a small burst of these:

[  324.028609] ft_recv_write_data: xid 0x171, f_ctl 0x8, cmd->sg ffff88004eceb000, cmd->sg_cnt 0x80. DDP was setup hence not expected to receive frame with payload, Frame  will be dropped if'Sequence Initiative' bit in f_ctl isnot set

which indicates another ixgbe problem. Note that this message did not result in any I/O errors or anything, so I think it can safely be ignored except to note that it is a sign of other ixgbe issues present in this kernel.

When I ran a kernel with lots of debugging turned on, I had one instance when maim had a failed I/O. I suspect that it may have been due to a packet loss, suggesting that somehow PFC may not have been set up right. There is such a pile of ixgbe changes in flight that almost anything is possible. Consistent with the ixgbe observation above.

When I ran a non-debugging kernel, I did not see the I/O failure, probably because the kernel could keep up better, but when my maim test got up to the pass that does 1MB reads, I got the following:

[  625.841786] BUG: unable to handle kernel paging request at ffffffffffffffe8
[  625.842717] IP: [<ffffffffa0131a14>] ft_get_task_tag+0x4/0x10 [tcm_fc]
[  625.842717] PGD 1541067 PUD 1542067 PMD 0 
[  625.842717] Oops: 0000 [#1] SMP 
[  625.842717] CPU 0 
[  625.842717] Modules linked in: sd_mod crc_t10dif tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod fcoe libfcoe 8021q libfc scsi_transport_fc scsi_tgt garp stp llc ixgbe mdio autofs4 [last unloaded: scsi_wait_scan]
[  625.842717] 
[  625.842717] Pid: 26, comm: kworker/u:1 Not tainted 3.5.0-rc2-mdrtarget-pending+ #5 Bochs Bochs
[  625.842717] RIP: 0010:[<ffffffffa0131a14>]  [<ffffffffa0131a14>] ft_get_task_tag+0x4/0x10 [tcm_fc]
[  625.842717] RSP: 0018:ffff88007c77bd88  EFLAGS: 00010246
[  625.842717] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88007c77bdd8
[  625.842717] RDX: ffff8800770061e8 RSI: dead000000100100 RDI: ffff880077006010
[  625.842717] RBP: ffff880077006010 R08: ffff88007ac31100 R09: ffff88007ac31100
[  625.842717] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c77bdd8
[  625.842717] R13: ffff88007ac31000 R14: ffff880064acd010 R15: 0000000000000001
[  625.842717] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  625.842717] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  625.842717] CR2: ffffffffffffffe8 CR3: 000000007a071000 CR4: 00000000000006f0
[  625.842717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  625.842717] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  625.842717] Process kworker/u:1 (pid: 26, threadinfo ffff88007c77a000, task ffff88007c7532d0)
[  625.842717] Stack:
[  625.842717]  ffffffffa00b35db 000001992f68fbd5 0000000000000000 ffff8800770061e8
[  625.842717]  ffff88007ac3108a 0000000000000282 0000000000000000 000000017fc11030
[  625.842717]  ffff88007ac31000 ffff88007fc10fc0 ffff88007c77bdd8 ffff88007c77bdd8
[  625.842717] Call Trace:
[  625.842717]  [<ffffffffa00b35db>] ? core_tmr_lun_reset+0x3fb/0x4d0 [target_core_mod]
[  625.842717]  [<ffffffffa00b4e00>] ? transport_cmd_check_stop+0x180/0x180 [target_core_mod]
[  625.842717]  [<ffffffffa00b4ea4>] ? target_tmr_work+0xa4/0xe0 [target_core_mod]
[  625.842717]  [<ffffffff8104a32e>] ? process_one_work+0xfe/0x380
[  625.842717]  [<ffffffff8104afec>] ? worker_thread+0x14c/0x330
[  625.842717]  [<ffffffff8104aea0>] ? manage_workers+0x210/0x210
[  625.842717]  [<ffffffff8104eeb5>] ? kthread+0x85/0x90
[  625.842717]  [<ffffffff813f52a4>] ? kernel_thread_helper+0x4/0x10
[  625.842717]  [<ffffffff8104ee30>] ? kthread_worker_fn+0x130/0x130
[  625.842717]  [<ffffffff813f52a0>] ? gs_change+0x13/0x13
[  625.842717] Code: f4 ff ff ff e9 4e fe ff ff be ad 00 00 00 48 c7 c7 e8 37 13 a0 e8 0d 1d f0 e0 e9 2b ff ff ff 0f 1f 84 00 00 00 00 00 48 8b 47 f8 <0f> b7 40 e8 c3 0f 1f 80 00 00 00 00 31 c0 c3 66 66 66 66 2e 0f 
[  625.842717] RIP  [<ffffffffa0131a14>] ft_get_task_tag+0x4/0x10 [tcm_fc]
[  625.842717]  RSP <ffff88007c77bd88>
[  625.842717] CR2: ffffffffffffffe8
[  625.842717] ---[ end trace e1b7f375916824f3 ]---
[  625.884637] BUG: unable to handle kernel paging request at fffffffffffffff8
[  625.885243] IP: [<ffffffff8104f0f7>] kthread_data+0x7/0x10
[  625.885243] PGD 1541067 PUD 1542067 PMD 0 
[  625.885243] Oops: 0000 [#2] SMP 
[  625.885243] CPU 0 
[  625.885243] Modules linked in: sd_mod crc_t10dif tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod fcoe libfcoe 8021q libfc scsi_transport_fc scsi_tgt garp stp llc ixgbe mdio autofs4 [last unloaded: scsi_wait_scan]
[  625.885243] 
[  625.885243] Pid: 26, comm: kworker/u:1 Tainted: G      D      3.5.0-rc2-mdrtarget-pending+ #5 Bochs Bochs
[  625.885243] RIP: 0010:[<ffffffff8104f0f7>]  [<ffffffff8104f0f7>] kthread_data+0x7/0x10
[  625.885243] RSP: 0018:ffff88007c77ba60  EFLAGS: 00010002
[  625.885243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffa47280
[  625.885243] RDX: 0000003c7324d4cf RSI: 0000000000000000 RDI: ffff88007c7532d0
[  625.885243] RBP: ffff88007c7532d0 R08: 0000000000000001 R09: 0000000000000001
[  625.885243] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  625.885243] R13: ffff88007c753550 R14: 0000000000000001 R15: ffff88007c7532c0
[  625.885243] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  625.885243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  625.885243] CR2: fffffffffffffff8 CR3: 000000007a083000 CR4: 00000000000006f0
[  625.885243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  625.885243] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  625.885243] Process kworker/u:1 (pid: 26, threadinfo ffff88007c77a000, task ffff88007c7532d0)
[  625.885243] Stack:
[  625.885243]  ffffffff8104b408 ffff88007fc10fc0 ffffffff813ecc0b ffff88007bcd16f8
[  625.885243]  ffff8800771f6610 ffff88007c77bfd8 ffff88007c77bfd8 ffff88007c77bfd8
[  625.885243]  ffff88007c7532d0 0000000000000046 ffff88007c7532d0 ffff88007c753750
[  625.885243] Call Trace:
[  625.885243]  [<ffffffff8104b408>] ? wq_worker_sleeping+0x8/0x90
[  625.885243]  [<ffffffff813ecc0b>] ? __schedule+0x39b/0x570
[  625.885243]  [<ffffffff81037ce0>] ? do_exit+0x570/0x890
[  625.885243]  [<ffffffff813eed64>] ? oops_end+0xa4/0xf0
[  625.885243]  [<ffffffff813e70d5>] ? no_context+0x25d/0x26c
[  625.885243]  [<ffffffff813e6988>] ? pmd_offset+0x16/0x1b
[  625.885243]  [<ffffffff813f1676>] ? do_page_fault+0x346/0x4c0
[  625.885243]  [<ffffffff8104bc84>] ? alloc_pid+0x2c4/0x440
[  625.885243]  [<ffffffff81027972>] ? pvclock_clocksource_read+0x52/0xf0
[  625.885243]  [<ffffffff8105e32e>] ? select_task_rq_fair+0x5ce/0x7d0
[  625.885243]  [<ffffffff8105d313>] ? check_preempt_wakeup+0xe3/0x240
[  625.885243]  [<ffffffff813ee1a5>] ? async_page_fault+0x25/0x30
[  625.885243]  [<ffffffffa0131a14>] ? ft_get_task_tag+0x4/0x10 [tcm_fc]
[  625.885243]  [<ffffffffa00b35db>] ? core_tmr_lun_reset+0x3fb/0x4d0 [target_core_mod]
[  625.885243]  [<ffffffffa00b4e00>] ? transport_cmd_check_stop+0x180/0x180 [target_core_mod]
[  625.885243]  [<ffffffffa00b4ea4>] ? target_tmr_work+0xa4/0xe0 [target_core_mod]
[  625.885243]  [<ffffffff8104a32e>] ? process_one_work+0xfe/0x380
[  625.885243]  [<ffffffff8104afec>] ? worker_thread+0x14c/0x330
[  625.885243]  [<ffffffff8104aea0>] ? manage_workers+0x210/0x210
[  625.885243]  [<ffffffff8104eeb5>] ? kthread+0x85/0x90
[  625.885243]  [<ffffffff813f52a4>] ? kernel_thread_helper+0x4/0x10
[  625.885243]  [<ffffffff8104ee30>] ? kthread_worker_fn+0x130/0x130
[  625.885243]  [<ffffffff813f52a0>] ? gs_change+0x13/0x13
[  625.885243] Code: 00 00 00 00 48 8b 5c 24 08 48 c7 c7 e0 eb 4a 81 be e2 00 00 00 48 8b 6c 24 10 48 83 c4 18 e9 21 46 fe ff 90 48 8b 87 28 02 00 00 <48> 8b 40 f8 c3 0f 1f 40 00 48 3b 3d 61 4e 67 00 74 0f 65 8b 04 
[  625.885243] RIP  [<ffffffff8104f0f7>] kthread_data+0x7/0x10
[  625.885243]  RSP <ffff88007c77ba60>
[  625.885243] CR2: fffffffffffffff8
[  625.885243] ---[ end trace e1b7f375916824f4 ]---

Resulting in a dead target machine. I haven't analyzed this any further yet, but I thought I should share what I'm seeing. Of course I don't know right now whether it is related to this particular patch or something earlier. If I can find a point where this doesn't happen, I'll try to bisect it.

At the moment I have 16 targets all on ramdisks. I was going to say that performance looked slow, but I just realized that I have a simple patch that improves performance that I've been waiting to get back from our validation team. So, not a fair comparison. I've been running with it for so long I can't remember what the target ran like without it. Hmm. I may also have had a debugging kernel on my initiator, will check that too.

-- 
Mark Rustad, LAN Access Division, Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html