On Thursday, August 08/31/17, 2017 at 20:54:31 +0530, Potnuri Bharat Teja wrote: > Hi Doug, > Could you please share the config you have on the Fedora box. > I tried enabling lock debug on 4.13-rc7 but I dont see the warning. Nevermind, I now see the issue on my machines. Thanks, Bharat. > Thanks, > Bharat. > On Tuesday, August 08/29/17, 2017 at 00:42:09 +0530, Doug Ledford wrote: > > On Mon, 2017-08-28 at 12:38 -0400, Doug Ledford wrote: > > > Resend from my work email address: > > > > > > > > > I ran across this while testing a 4.13-rc7 kernel + the rdma next > > > code. > > > > This reproduces on a stock 4.13-rc7 kernel. But, across all the stuff > > I've booted it on so far, it only shows up on cxgb4 devices, so I think > > this is a cxgb4 specific issue. Steve, can you look into this? > > > > My basic config is a stock Fedora rawhide box and I took the Fedora > > kernel config and copied it into my git repo checkout of v4.13-rc7 and > > compiled using that config. If you need any more info, I can try to > > get it to you. > > > > The machine environment that produces this includes: > > > > base Ethernet device + 2 vlan devices > > srp target mode is in use (kernel LIO support), the iwarp device isn't > > specifically configured for use, but srpt tries to set it up anyway > > iser target mode is in use (kernel LIO support again, single tpg with > > wildcard address so the iwarp devices are in use) > > nfsordma in use and exporting several mount points, again with wildcard > > address so all RDMA devices are candidates > > > > With this environment, I get the trackback on bootup every time. It > > then proceeds to run. I haven't tested it under load to see how it > > does, but it's up anyway. > > > > > I don't have the time to track this down before going on PTO, so I'm > > > putting it out here for others to look at. > > > > > > This machine holds multiple connections in it: > > > > > > ib0/ib1 -> dual port qib > > > roce -> ocrdma > > > iwarp -> cxgb4 > > > > > > During bootup I got this: > > > > > > [ 37.244753] iw_cxgb4: 0000:83:00.4: Up > > > [ 37.250168] iw_cxgb4: 0000:83:00.4: On-Chip Queues not supported > > > on > > > this deve > > > > > > [ 37.263207] ====================================================== > > > [ 37.270656] WARNING: possible circular locking dependency detected > > > [ 37.278101] 4.13.0-rc7+ #130 Not tainted > > > [ 37.283019] ------------------------------------------------------ > > > [ 37.290470] NetworkManager/2196 is trying to acquire lock: > > > [ 37.297143] (device_mutex){+.+.+.}, at: [<ffffffffc08d2465>] > > > ib_register_de] > > > [ 37.308026] > > > but task is already holding lock: > > > [ 37.315694] (uld_mutex){+.+.+.}, at: [<ffffffffc0574fd4>] > > > notify_ulds.isra.] > > > [ 37.326108] > > > which lock already depends on the new lock. > > > > > > [ 37.337689] > > > the existing dependency chain (in reverse order) is: > > > [ 37.347301] > > > -> #2 (uld_mutex){+.+.+.}: > > > [ 37.354048] lock_acquire+0xbd/0x200 > > > [ 37.359083] __mutex_lock+0x88/0x950 > > > [ 37.364122] mutex_lock_nested+0x1b/0x20 > > > [ 37.369690] cxgb_up+0x27/0x840 [cxgb4] > > > [ 37.375623] cxgb_open+0x34/0x90 [cxgb4] > > > [ 37.381168] __dev_open+0xc9/0x140 > > > [ 37.386039] __dev_change_flags+0x9d/0x160 > > > [ 37.391686] dev_change_flags+0x29/0x60 > > > [ 37.397069] do_setlink+0x4bf/0xc80 > > > [ 37.402024] rtnl_newlink+0x512/0x8a0 > > > [ 37.407177] rtnetlink_rcv_msg+0xac/0x240 > > > [ 37.412702] netlink_rcv_skb+0xed/0x120 > > > [ 37.418023] rtnetlink_rcv+0x2a/0x40 > > > [ 37.423060] netlink_unicast+0x182/0x220 > > > [ 37.428482] netlink_sendmsg+0x2e9/0x3e0 > > > [ 37.433868] sock_sendmsg+0x38/0x50 > > > [ 37.438766] ___sys_sendmsg+0x2b2/0x2d0 > > > [ 37.444052] __sys_sendmsg+0x54/0x90 > > > [ 37.449047] SyS_sendmsg+0x12/0x20 > > > [ 37.453848] entry_SYSCALL_64_fastpath+0x1f/0xbe > > > [ 37.460007] > > > -> #1 (rtnl_mutex){+.+.+.}: > > > [ 37.466764] lock_acquire+0xbd/0x200 > > > [ 37.471745] __mutex_lock+0x88/0x950 > > > [ 37.476853] mutex_lock_nested+0x1b/0x20 > > > [ 37.482336] rtnl_lock+0x17/0x20 > > > [ 37.487038] enum_all_gids_of_dev_cb+0x25/0xd0 [ib_core] > > > [ 37.494509] ib_enum_roce_netdev+0xe7/0x100 [ib_core] > > > [ 37.501256] roce_rescan_device+0x21/0x30 [ib_core] > > > [ 37.507680] ib_cache_setup_one+0x1f1/0x350 [ib_core] > > > [ 37.514297] ib_register_device+0x444/0x720 [ib_core] > > > [ 37.520900] ocrdma_add+0x46f/0x820 [ocrdma] > > > [ 37.526622] _be_roce_dev_add+0x17d/0x1e0 [be2net] > > > [ 37.532929] be_roce_register_driver+0x4a/0x90 [be2net] > > > [ 37.539716] ib_umad_poll+0x15/0x50 [ib_umad] > > > [ 37.545527] do_one_initcall+0x51/0x1a9 > > > [ 37.550881] do_init_module+0x60/0x1ff > > > [ 37.556129] load_module+0x257e/0x2b10 > > > [ 37.561375] SYSC_finit_module+0xa9/0x100 > > > [ 37.566880] SyS_finit_module+0xe/0x10 > > > [ 37.572099] do_syscall_64+0x6c/0x1d0 > > > [ 37.577178] return_from_SYSCALL_64+0x0/0x7a > > > [ 37.583232] > > > -> #0 (device_mutex){+.+.+.}: > > > [ 37.590704] __lock_acquire+0x153c/0x1550 > > > [ 37.596442] lock_acquire+0xbd/0x200 > > > [ 37.601399] __mutex_lock+0x88/0x950 > > > [ 37.606346] mutex_lock_nested+0x1b/0x20 > > > [ 37.611669] ib_register_device+0xb5/0x720 [ib_core] > > > [ 37.618170] c4iw_register_device+0x3a0/0x460 [iw_cxgb4] > > > [ 37.625061] c4iw_uld_state_change+0x7a4/0xcd0 [iw_cxgb4] > > > [ 37.632108] notify_ulds.isra.28+0x3f/0x60 [cxgb4] > > > [ 37.638410] cxgb_up+0x70b/0x840 [cxgb4] > > > [ 37.643946] cxgb_open+0x34/0x90 [cxgb4] > > > [ 37.649265] __dev_open+0xc9/0x140 > > > [ 37.653977] __dev_change_flags+0x9d/0x160 > > > [ 37.659613] dev_change_flags+0x29/0x60 > > > [ 37.665046] do_setlink+0x4bf/0xc80 > > > [ 37.669851] rtnl_newlink+0x512/0x8a0 > > > [ 37.675090] rtnetlink_rcv_msg+0xac/0x240 > > > [ 37.680717] netlink_rcv_skb+0xed/0x120 > > > [ 37.685937] rtnetlink_rcv+0x2a/0x40 > > > [ 37.691081] netlink_unicast+0x182/0x220 > > > [ 37.696607] netlink_sendmsg+0x2e9/0x3e0 > > > [ 37.702136] sock_sendmsg+0x38/0x50 > > > [ 37.707180] ___sys_sendmsg+0x2b2/0x2d0 > > > [ 37.712639] __sys_sendmsg+0x54/0x90 > > > [ 37.717542] SyS_sendmsg+0x12/0x20 > > > [ 37.722249] entry_SYSCALL_64_fastpath+0x1f/0xbe > > > [ 37.728326] > > > other info that might help us debug this: > > > > > > [ 37.738479] Chain exists of: > > > device_mutex --> rtnl_mutex --> uld_mutex > > > > > > [ 37.750153] Possible unsafe locking scenario: > > > > > > [ 37.757412] CPU0 CPU1 > > > [ 37.762894] ---- ---- > > > [ 37.768381] lock(uld_mutex); > > > [ 37.772149] lock(rtnl_mutex); > > > [ 37.778830] lock(uld_mutex); > > > [ 37.785413] lock(device_mutex); > > > [ 37.789462] > > > *** DEADLOCK *** > > > > > > [ 37.797070] 2 locks held by NetworkManager/2196: > > > [ 37.802557] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff9e83457b>] > > > rtnetlink_r0 > > > [ 37.812213] #1: (uld_mutex){+.+.+.}, at: [<ffffffffc0574fd4>] > > > notify_ulds.] > > > [ 37.822846] > > > stack backtrace: > > > [ 37.828894] CPU: 17 PID: 2196 Comm: NetworkManager Not tainted > > > 4.13.0-rc7+ #0 > > > [ 37.837655] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS > > > 2.0.2 03/6 > > > [ 37.846551] Call Trace: > > > [ 37.849630] dump_stack+0x85/0xcc > > > [ 37.853679] print_circular_bug+0x200/0x20e > > > [ 37.858806] __lock_acquire+0x153c/0x1550 > > > [ 37.863738] lock_acquire+0xbd/0x200 > > > [ 37.868138] ? ib_register_device+0xb5/0x720 [ib_core] > > > [ 37.874275] ? ib_register_device+0xb5/0x720 [ib_core] > > > [ 37.880403] __mutex_lock+0x88/0x950 > > > [ 37.884782] ? ib_register_device+0xb5/0x720 [ib_core] > > > [ 37.890914] ? ib_register_device+0xb5/0x720 [ib_core] > > > [ 37.897108] ? find_held_lock+0x40/0xb0 > > > [ 37.901838] mutex_lock_nested+0x1b/0x20 > > > [ 37.906669] ib_register_device+0xb5/0x720 [ib_core] > > > [ 37.912669] ? c4iw_register_device+0x2f6/0x460 [iw_cxgb4] > > > [ 37.919261] ? rcu_read_lock_sched_held+0x98/0xa0 > > > [ 37.924973] ? kmem_cache_alloc_trace+0x278/0x2e0 > > > [ 37.930691] ? c4iw_register_device+0x2f6/0x460 [iw_cxgb4] > > > [ 37.937293] c4iw_register_device+0x3a0/0x460 [iw_cxgb4] > > > [ 37.943702] c4iw_uld_state_change+0x7a4/0xcd0 [iw_cxgb4] > > > [ 37.950213] ? notify_ulds.isra.28+0x24/0x60 [cxgb4] > > > [ 37.956244] notify_ulds.isra.28+0x3f/0x60 [cxgb4] > > > [ 37.962083] cxgb_up+0x70b/0x840 [cxgb4] > > > [ 37.966951] ? cxgb4_ofld_send+0x20/0x20 [cxgb4] > > > [ 37.972594] cxgb_open+0x34/0x90 [cxgb4] > > > [ 37.977462] __dev_open+0xc9/0x140 > > > [ 37.981741] __dev_change_flags+0x9d/0x160 > > > [ 37.986794] dev_change_flags+0x29/0x60 > > > [ 37.991557] do_setlink+0x4bf/0xc80 > > > [ 37.995931] rtnl_newlink+0x512/0x8a0 > > > [ 38.000500] ? rtnl_newlink+0x104/0x8a0 > > > [ 38.005263] ? check_usage+0xb5/0x490 > > > [ 38.009826] ? ns_capable_common+0x7a/0x90 > > > [ 38.014876] ? ns_capable+0x13/0x20 > > > [ 38.019253] rtnetlink_rcv_msg+0xac/0x240 > > > [ 38.024215] ? rtnetlink_rcv+0x1b/0x40 > > > [ 38.028879] ? netlink_deliver_tap+0x7a/0x2c0 > > > [ 38.034232] ? rtnl_newlink+0x8a0/0x8a0 > > > [ 38.038995] netlink_rcv_skb+0xed/0x120 > > > [ 38.043760] rtnetlink_rcv+0x2a/0x40 > > > [ 38.048244] netlink_unicast+0x182/0x220 > > > [ 38.053119] netlink_sendmsg+0x2e9/0x3e0 > > > [ 38.057985] sock_sendmsg+0x38/0x50 > > > [ 38.062243] ___sys_sendmsg+0x2b2/0x2d0 > > > [ 38.066877] ? find_held_lock+0x40/0xb0 > > > [ 38.071499] ? __fget+0x102/0x210 > > > [ 38.075647] ? __fget+0x121/0x210 > > > [ 38.079780] ? __fget+0x5/0x210 > > > [ 38.083706] ? __fget_light+0x25/0x70 > > > [ 38.088208] __sys_sendmsg+0x54/0x90 > > > [ 38.092606] SyS_sendmsg+0x12/0x20 > > > [ 38.096810] entry_SYSCALL_64_fastpath+0x1f/0xbe > > > [ 38.102379] RIP: 0033:0x7f146e486974 > > > [ 38.106778] RSP: 002b:00007ffd0cd3ee00 EFLAGS: 00000293 ORIG_RAX: > > > 0000000000e > > > [ 38.115654] RAX: ffffffffffffffda RBX: 000055698f9641f9 RCX: > > > 00007f146e486974 > > > [ 38.124058] RDX: 0000000000000000 RSI: 00007ffd0cd3ee50 RDI: > > > 0000000000000007 > > > [ 38.132474] RBP: 00007ffd0cd3f2e0 R08: 0000000000000000 R09: > > > 000055699118c300 > > > [ 38.140884] R10: 0000000000000001 R11: 0000000000000293 R12: > > > 0000000000000001 > > > [ 38.149306] R13: 0000000000000001 R14: 00007ffd0cd3f010 R15: > > > 000055698fbda5c0 > > > [ 38.160359] ib_srpt srpt_add_one(cxgb4_0) failed. > > > > > -- > > Doug Ledford <dledford@xxxxxxxxxx> > > GPG KeyID: B826A3330E572FDD > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html