> > Hi MDR, Robert & Co, > > During the process of updating target-pending.git/master to v3.7-rc2 > this afternoon, I noticed the following warnings below when using > tcm_fc. > > The Poison overwritten appears during each I/O, but the LUN SCAN + I/O > are seem to be still working as expected.. > > AFAICT there has not been anything effecting tcm_fc that has gone in > recently, so it looks like some type of libfcoe or libfc regression. > > Any ideas where to start looking to track this down..? Nick, I am seeing somewhat similar but not the same starting from merge window before the rc1 tag but so far I was still not able to pin-down where it is and I am not able to reproduce the problem anymore. The problem was exposed when somehow the initiator was zoned with SW target even though itself was not intended to involve the SW target. So I would like to know if I can reproduce this in your setup to track it down. The bug was found during lldp enable/disable test w/ I/O running. From what I can tell, it was related to exchange release path that the reference count on the exchange somehow is messed up. Originally, I was suspecting the cancel_delayed_work() is always returning true even we have no work pending that may have caused us to underflow the refcnt on exchange, but it was not the case. While investigating that, one minor issue was fc_exch_find() may return a valid exchange evne though the xid is not matching up, I have a patch to fix that, however, the exchange pool must have already been messed up when that happens. Anyway, I would like to mimic your setup to see if I can reproduce it. The trace I had is pasted here FYI: ... kernel: Pid: 5072, comm: kworker/u:7 Tainted: G W 3.6.0-upstream-net-next-ixgbe-queue-x86_64-g0b kernel: Call Trace: kernel: [<ffffffff810541ff>] warn_slowpath_common+0x7f/0xc0 kernel: [<ffffffff810542f6>] warn_slowpath_fmt+0x46/0x50 kernel: [<ffffffff8126bb01>] __list_del_entry+0xa1/0xd0 kernel: [<ffffffff8126bb41>] list_del+0x11/0x40 kernel: [<ffffffffa03adfaf>] fc_exch_delete+0x6f/0xb0 [libfc] kernel: [<ffffffffa03b1074>] fc_exch_timeout+0x124/0x150 [libfc] kernel: [<ffffffff81070c27>] process_one_work+0x177/0x430 kernel: [<ffffffffa03b0f50>] ? fc_exch_rrq+0x220/0x220 [libfc] kernel: [<ffffffff8107303e>] worker_thread+0x12e/0x380 kernel: [<ffffffff81072f10>] ? manage_workers+0x180/0x180 kernel: [<ffffffff810781ae>] kthread+0xce/0xe0 kernel: [<ffffffff815311c4>] kernel_thread_helper+0x4/0x10 kernel: [<ffffffff810780e0>] ? kthread_freezable_should_stop+0x70/0x70 kernel: [<ffffffff815311c0>] ? gs_change+0x13/0x13 kernel: ---[] end trace f4c13caf2990c079 ]--- kernel: ------------[] cut here ]------------ kernel: WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() kernel: Hardware name: PowerEdge T610 kernel: list_del corruption. prev->next should be ffff88031cfeb2e0, but was ffffe8ffffc80348 kernel: Pid: 5072, comm: kworker/u:7 Tainted: G W 3.6.0-upstream-net-next-ixgbe-queue-x86_64-g0b kernel: Call Trace: kernel: [<ffffffff810541ff>] warn_slowpath_common+0x7f/0xc0 kernel: [<ffffffff810542f6>] warn_slowpath_fmt+0x46/0x50 kernel: [<ffffffff8126bb01>] __list_del_entry+0xa1/0xd0 kernel: [<ffffffff8126bb41>] list_del+0x11/0x40 kernel: [<ffffffffa03adfaf>] fc_exch_delete+0x6f/0xb0 [libfc] kernel: [<ffffffffa03b1074>] fc_exch_timeout+0x124/0x150 [libfc] kernel: [<ffffffff81070c27>] process_one_work+0x177/0x430 kernel: [<ffffffffa03b0f50>] ? fc_exch_rrq+0x220/0x220 [libfc] kernel: [<ffffffff8107303e>] worker_thread+0x12e/0x380 kernel: [<ffffffff81072f10>] ? manage_workers+0x180/0x180 kernel: [<ffffffff810781ae>] kthread+0xce/0xe0 kernel: [<ffffffff815311c4>] kernel_thread_helper+0x4/0x10 kernel: [<ffffffff810780e0>] ? kthread_freezable_should_stop+0x70/0x70 kernel: [<ffffffff815311c0>] ? gs_change+0x13/0x13 kernel: ---[] end trace f4c13caf2990c07a ]--- kernel: ixgbe 0000:05:00.0: Multiqueue Enabled: Rx Queue count = 24, Tx Queue count = 24 kernel: ixgbe 0000:05:00.0 p3p1: detected SFP+: 5 kernel: ixgbe 0000:05:00.0 p3p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX kernel: BUG: soft lockup - CPU#6 stuck for 23s! [kworker/u:2:390] kernel: CPU 6 kernel: Pid: 390, comm: kworker/u:2 Tainted: G W 3.6.0-upstream-net-next-ixgbe-queue-x86_64-g0bf kernel: RIP: 0010:[<ffffffffa03afe6b>] [<ffffffffa03afe6b>] fc_exch_reset+0x1b/0xf0 [libfc] kernel: RSP: 0018:ffff880326293c90 EFLAGS: 00000286 kernel: RAX: ffff880326293fd8 RBX: ffff880326293c50 RCX: 0000000000b60300 kernel: RDX: ffff8803263e00a0 RSI: 0000000000000001 RDI: ffff8803263e0080 kernel: RBP: ffff880326293cb0 R08: 0000000000000004 R09: 0000000000000000 kernel: R10: 0000000000000014 R11: 0000000000000001 R12: ffff880326293c68 kernel: R13: ffff8803263e0100 R14: 0000000000000014 R15: ffff880326293c70 kernel: FS: 0000000000000000(0000) GS:ffff88032fc60000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b kernel: CR2: 00007f91b6920000 CR3: 0000000001a0b000 CR4: 00000000000007e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 kernel: Process kworker/u:2 (pid: 390, threadinfo ffff880326292000, task ffff880326219540) kernel: Stack: kernel: ffff8801a53a06c0 ffffe8ffffc80340 0000000000000000 ffff8803263e0080 kernel: ffff880326293d00 ffffffffa03affd7 ffff880326293d00 ffffffff00b60300 kernel: 000000000000002c ffff8801a85b0840 ffffffff81ae6620 ffff8801a53a06c0 kernel: Call Trace: kernel: [<ffffffffa03affd7>] fc_exch_pool_reset+0x97/0xe0 [libfc] kernel: [<ffffffffa03b0092>] fc_exch_mgr_reset+0x72/0xb0 [libfc] kernel: [<ffffffffa03b8ce0>] fc_rport_work+0x120/0x630 [libfc] kernel: [<ffffffff8106f8a2>] ? ftrace_raw_event_workqueue_execute_start+0xb2/0xc0 kernel: [<ffffffff81070c27>] process_one_work+0x177/0x430 kernel: [<ffffffffa03b8bc0>] ? fc_rport_recv_els_req+0x1d0/0x1d0 [libfc] kernel: [<ffffffff8107303e>] worker_thread+0x12e/0x380 kernel: [<ffffffff81072f10>] ? manage_workers+0x180/0x180 kernel: [<ffffffff810781ae>] kthread+0xce/0xe0 kernel: [<ffffffff815311c4>] kernel_thread_helper+0x4/0x10 kernel: [<ffffffff810780e0>] ? kthread_freezable_should_stop+0x70/0x70 kernel: [<ffffffff815311c0>] ? gs_change+0x13/0x13 kernel: Code: c0 e8 0a 4c 17 e1 e9 2a ff ff ff 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 1c 24 4c 89 64 4 24 18 <66> 66 66 66 90 48 89 fb e8 88 77 17 e1 31 f6 48 89 df e8 0e ea kernel: libfcoe: host3: Missing Discovery Advertisement for fab 20ac000dec96e941 count 1 > > Thanks, > > --nab > > [ 3930.920161] > =================================================================== > ========== > [ 3930.929274] BUG libfc_em (Tainted: G B ): Poison overwritten > [ 3930.936447] ----------------------------------------------------------------------------- > [ 3930.936447] > [ 3930.947203] INFO: 0xffff8806e04f0004-0xffff8806e04f0004. First byte 0x6a > instead of 0x6b > [ 3930.956219] INFO: Allocated in mempool_alloc_slab+0x10/0x12 age=9 cpu=3 > pid=25401 > [ 3930.964556] INFO: Freed in mempool_free_slab+0x12/0x14 age=11 cpu=3 > pid=18291 > [ 3930.972506] INFO: Slab 0xffffea001b813c00 objects=42 used=42 fp=0x > (null) flags=0x8000000000004080 > [ 3930.983360] INFO: Object 0xffff8806e04f0000 @offset=0 > fp=0xffff8806e04f3d80 > [ 3930.983360] > [ 3930.992762] Object ffff8806e04f0000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkjkkkkkkkkkkk > [ 3931.003132] Object ffff8806e04f0010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.013500] Object ffff8806e04f0020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.023868] Object ffff8806e04f0030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.034235] Object ffff8806e04f0040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.044605] Object ffff8806e04f0050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.054973] Object ffff8806e04f0060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.065341] Object ffff8806e04f0070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.075709] Object ffff8806e04f0080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.086080] Object ffff8806e04f0090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.096448] Object ffff8806e04f00a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.106816] Object ffff8806e04f00b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.117184] Object ffff8806e04f00c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.127552] Object ffff8806e04f00d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.137921] Object ffff8806e04f00e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b kkkkkkkkkkkkkkkk > [ 3931.148290] Object ffff8806e04f00f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b a5 kkkkkkkkkkkkkkk. > [ 3931.158659] Redzone ffff8806e04f0100: bb bb bb bb bb bb bb bb > ........ > [ 3931.168351] Padding ffff8806e04f0140: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > 5a 5a 5a ZZZZZZZZZZZZZZZZ > [ 3931.178816] Padding ffff8806e04f0150: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > 5a 5a 5a ZZZZZZZZZZZZZZZZ > [ 3931.189282] Padding ffff8806e04f0160: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > 5a 5a 5a ZZZZZZZZZZZZZZZZ > [ 3931.199747] Padding ffff8806e04f0170: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > 5a 5a 5a ZZZZZZZZZZZZZZZZ > [ 3931.210213] Pid: 25402, comm: fcoethread/4 Tainted: G B 3.7.0-rc2+ #51 > [ 3931.218452] Call Trace: > [ 3931.221175] [<ffffffff810b08cc>] print_trailer+0x126/0x12f > [ 3931.227381] [<ffffffff810b0dfe>] check_bytes_and_report+0xb2/0xeb > [ 3931.234267] [<ffffffff81088dbb>] ? mempool_alloc_slab+0xf/0x12 > [ 3931.240860] [<ffffffff810b0eec>] check_object+0xb5/0x1df > [ 3931.246873] [<ffffffff81088dbc>] ? mempool_alloc_slab+0x10/0x12 > [ 3931.253565] [<ffffffff810b1a69>] alloc_debug_processing+0xa4/0x138 > [ 3931.260547] [<ffffffff810b3150>] T.1679+0x28d/0x2d9 > [ 3931.266075] [<ffffffff81088dbc>] ? mempool_alloc_slab+0x10/0x12 > [ 3931.272767] [<ffffffff81088dbc>] ? mempool_alloc_slab+0x10/0x12 > [ 3931.279458] [<ffffffff810b330c>] kmem_cache_alloc+0x4d/0xa0 > [ 3931.285763] [<ffffffff81088dbc>] mempool_alloc_slab+0x10/0x12 > [ 3931.292258] [<ffffffff81088ee6>] mempool_alloc+0x5a/0x139 > [ 3931.298369] [<ffffffff810b0f72>] ? check_object+0x13b/0x1df > [ 3931.304674] [<ffffffffa03d673a>] fc_exch_em_alloc+0x25/0x1f6 [libfc] > [ 3931.311850] [<ffffffffa03d6c62>] fc_seq_lookup_recip+0x17c/0x347 [libfc] > [ 3931.319413] [<ffffffffa03d6eac>] fc_seq_assign+0x7f/0x9c [libfc] > [ 3931.326202] [<ffffffffa067829e>] ft_recv_req+0x76/0x14c [tcm_fc] > [ 3931.332991] [<ffffffffa0679f1c>] ft_recv+0x129/0x132 [tcm_fc] > [ 3931.339490] [<ffffffffa03db815>] fc_lport_recv_req+0x68/0xc5 [libfc] > [ 3931.346667] [<ffffffffa03d8ecd>] fc_exch_recv+0xaed/0xbb0 [libfc] > [ 3931.353552] [<ffffffffa06b9cc4>] ? fcoe_percpu_thread_create+0x7b/0x7b > [fcoe] > [ 3931.361600] [<ffffffffa06ba05c>] fcoe_percpu_receive_thread+0x398/0x46f > [fcoe] > [ 3931.369744] [<ffffffffa06b9cc4>] ? fcoe_percpu_thread_create+0x7b/0x7b > [fcoe] > [ 3931.377790] [<ffffffff81043f56>] kthread+0xb0/0xb8 > [ 3931.383224] [<ffffffff81043ea6>] ? kthread_freezable_should_stop+0x60/0x60 > [ 3931.390979] [<ffffffff81362c7c>] ret_from_fork+0x7c/0xb0 > [ 3931.396993] [<ffffffff81043ea6>] ? kthread_freezable_should_stop+0x60/0x60 > [ 3931.404749] FIX libfc_em: Restoring 0xffff8806e04f0004- > 0xffff8806e04f0004=0x6b > [ 3931.404749] > [ 3931.414441] FIX libfc_em: Marking all objects used > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f