Resend from my work email address: I ran across this while testing a 4.13-rc7 kernel + the rdma next code. I don't have the time to track this down before going on PTO, so I'm putting it out here for others to look at. This machine holds multiple connections in it: ib0/ib1 -> dual port qib roce -> ocrdma iwarp -> cxgb4 During bootup I got this: [ 37.244753] iw_cxgb4: 0000:83:00.4: Up [ 37.250168] iw_cxgb4: 0000:83:00.4: On-Chip Queues not supported on this deve [ 37.263207] ====================================================== [ 37.270656] WARNING: possible circular locking dependency detected [ 37.278101] 4.13.0-rc7+ #130 Not tainted [ 37.283019] ------------------------------------------------------ [ 37.290470] NetworkManager/2196 is trying to acquire lock: [ 37.297143] (device_mutex){+.+.+.}, at: [<ffffffffc08d2465>] ib_register_de] [ 37.308026] but task is already holding lock: [ 37.315694] (uld_mutex){+.+.+.}, at: [<ffffffffc0574fd4>] notify_ulds.isra.] [ 37.326108] which lock already depends on the new lock. [ 37.337689] the existing dependency chain (in reverse order) is: [ 37.347301] -> #2 (uld_mutex){+.+.+.}: [ 37.354048] lock_acquire+0xbd/0x200 [ 37.359083] __mutex_lock+0x88/0x950 [ 37.364122] mutex_lock_nested+0x1b/0x20 [ 37.369690] cxgb_up+0x27/0x840 [cxgb4] [ 37.375623] cxgb_open+0x34/0x90 [cxgb4] [ 37.381168] __dev_open+0xc9/0x140 [ 37.386039] __dev_change_flags+0x9d/0x160 [ 37.391686] dev_change_flags+0x29/0x60 [ 37.397069] do_setlink+0x4bf/0xc80 [ 37.402024] rtnl_newlink+0x512/0x8a0 [ 37.407177] rtnetlink_rcv_msg+0xac/0x240 [ 37.412702] netlink_rcv_skb+0xed/0x120 [ 37.418023] rtnetlink_rcv+0x2a/0x40 [ 37.423060] netlink_unicast+0x182/0x220 [ 37.428482] netlink_sendmsg+0x2e9/0x3e0 [ 37.433868] sock_sendmsg+0x38/0x50 [ 37.438766] ___sys_sendmsg+0x2b2/0x2d0 [ 37.444052] __sys_sendmsg+0x54/0x90 [ 37.449047] SyS_sendmsg+0x12/0x20 [ 37.453848] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 37.460007] -> #1 (rtnl_mutex){+.+.+.}: [ 37.466764] lock_acquire+0xbd/0x200 [ 37.471745] __mutex_lock+0x88/0x950 [ 37.476853] mutex_lock_nested+0x1b/0x20 [ 37.482336] rtnl_lock+0x17/0x20 [ 37.487038] enum_all_gids_of_dev_cb+0x25/0xd0 [ib_core] [ 37.494509] ib_enum_roce_netdev+0xe7/0x100 [ib_core] [ 37.501256] roce_rescan_device+0x21/0x30 [ib_core] [ 37.507680] ib_cache_setup_one+0x1f1/0x350 [ib_core] [ 37.514297] ib_register_device+0x444/0x720 [ib_core] [ 37.520900] ocrdma_add+0x46f/0x820 [ocrdma] [ 37.526622] _be_roce_dev_add+0x17d/0x1e0 [be2net] [ 37.532929] be_roce_register_driver+0x4a/0x90 [be2net] [ 37.539716] ib_umad_poll+0x15/0x50 [ib_umad] [ 37.545527] do_one_initcall+0x51/0x1a9 [ 37.550881] do_init_module+0x60/0x1ff [ 37.556129] load_module+0x257e/0x2b10 [ 37.561375] SYSC_finit_module+0xa9/0x100 [ 37.566880] SyS_finit_module+0xe/0x10 [ 37.572099] do_syscall_64+0x6c/0x1d0 [ 37.577178] return_from_SYSCALL_64+0x0/0x7a [ 37.583232] -> #0 (device_mutex){+.+.+.}: [ 37.590704] __lock_acquire+0x153c/0x1550 [ 37.596442] lock_acquire+0xbd/0x200 [ 37.601399] __mutex_lock+0x88/0x950 [ 37.606346] mutex_lock_nested+0x1b/0x20 [ 37.611669] ib_register_device+0xb5/0x720 [ib_core] [ 37.618170] c4iw_register_device+0x3a0/0x460 [iw_cxgb4] [ 37.625061] c4iw_uld_state_change+0x7a4/0xcd0 [iw_cxgb4] [ 37.632108] notify_ulds.isra.28+0x3f/0x60 [cxgb4] [ 37.638410] cxgb_up+0x70b/0x840 [cxgb4] [ 37.643946] cxgb_open+0x34/0x90 [cxgb4] [ 37.649265] __dev_open+0xc9/0x140 [ 37.653977] __dev_change_flags+0x9d/0x160 [ 37.659613] dev_change_flags+0x29/0x60 [ 37.665046] do_setlink+0x4bf/0xc80 [ 37.669851] rtnl_newlink+0x512/0x8a0 [ 37.675090] rtnetlink_rcv_msg+0xac/0x240 [ 37.680717] netlink_rcv_skb+0xed/0x120 [ 37.685937] rtnetlink_rcv+0x2a/0x40 [ 37.691081] netlink_unicast+0x182/0x220 [ 37.696607] netlink_sendmsg+0x2e9/0x3e0 [ 37.702136] sock_sendmsg+0x38/0x50 [ 37.707180] ___sys_sendmsg+0x2b2/0x2d0 [ 37.712639] __sys_sendmsg+0x54/0x90 [ 37.717542] SyS_sendmsg+0x12/0x20 [ 37.722249] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 37.728326] other info that might help us debug this: [ 37.738479] Chain exists of: device_mutex --> rtnl_mutex --> uld_mutex [ 37.750153] Possible unsafe locking scenario: [ 37.757412] CPU0 CPU1 [ 37.762894] ---- ---- [ 37.768381] lock(uld_mutex); [ 37.772149] lock(rtnl_mutex); [ 37.778830] lock(uld_mutex); [ 37.785413] lock(device_mutex); [ 37.789462] *** DEADLOCK *** [ 37.797070] 2 locks held by NetworkManager/2196: [ 37.802557] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff9e83457b>] rtnetlink_r0 [ 37.812213] #1: (uld_mutex){+.+.+.}, at: [<ffffffffc0574fd4>] notify_ulds.] [ 37.822846] stack backtrace: [ 37.828894] CPU: 17 PID: 2196 Comm: NetworkManager Not tainted 4.13.0-rc7+ #0 [ 37.837655] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 2.0.2 03/6 [ 37.846551] Call Trace: [ 37.849630] dump_stack+0x85/0xcc [ 37.853679] print_circular_bug+0x200/0x20e [ 37.858806] __lock_acquire+0x153c/0x1550 [ 37.863738] lock_acquire+0xbd/0x200 [ 37.868138] ? ib_register_device+0xb5/0x720 [ib_core] [ 37.874275] ? ib_register_device+0xb5/0x720 [ib_core] [ 37.880403] __mutex_lock+0x88/0x950 [ 37.884782] ? ib_register_device+0xb5/0x720 [ib_core] [ 37.890914] ? ib_register_device+0xb5/0x720 [ib_core] [ 37.897108] ? find_held_lock+0x40/0xb0 [ 37.901838] mutex_lock_nested+0x1b/0x20 [ 37.906669] ib_register_device+0xb5/0x720 [ib_core] [ 37.912669] ? c4iw_register_device+0x2f6/0x460 [iw_cxgb4] [ 37.919261] ? rcu_read_lock_sched_held+0x98/0xa0 [ 37.924973] ? kmem_cache_alloc_trace+0x278/0x2e0 [ 37.930691] ? c4iw_register_device+0x2f6/0x460 [iw_cxgb4] [ 37.937293] c4iw_register_device+0x3a0/0x460 [iw_cxgb4] [ 37.943702] c4iw_uld_state_change+0x7a4/0xcd0 [iw_cxgb4] [ 37.950213] ? notify_ulds.isra.28+0x24/0x60 [cxgb4] [ 37.956244] notify_ulds.isra.28+0x3f/0x60 [cxgb4] [ 37.962083] cxgb_up+0x70b/0x840 [cxgb4] [ 37.966951] ? cxgb4_ofld_send+0x20/0x20 [cxgb4] [ 37.972594] cxgb_open+0x34/0x90 [cxgb4] [ 37.977462] __dev_open+0xc9/0x140 [ 37.981741] __dev_change_flags+0x9d/0x160 [ 37.986794] dev_change_flags+0x29/0x60 [ 37.991557] do_setlink+0x4bf/0xc80 [ 37.995931] rtnl_newlink+0x512/0x8a0 [ 38.000500] ? rtnl_newlink+0x104/0x8a0 [ 38.005263] ? check_usage+0xb5/0x490 [ 38.009826] ? ns_capable_common+0x7a/0x90 [ 38.014876] ? ns_capable+0x13/0x20 [ 38.019253] rtnetlink_rcv_msg+0xac/0x240 [ 38.024215] ? rtnetlink_rcv+0x1b/0x40 [ 38.028879] ? netlink_deliver_tap+0x7a/0x2c0 [ 38.034232] ? rtnl_newlink+0x8a0/0x8a0 [ 38.038995] netlink_rcv_skb+0xed/0x120 [ 38.043760] rtnetlink_rcv+0x2a/0x40 [ 38.048244] netlink_unicast+0x182/0x220 [ 38.053119] netlink_sendmsg+0x2e9/0x3e0 [ 38.057985] sock_sendmsg+0x38/0x50 [ 38.062243] ___sys_sendmsg+0x2b2/0x2d0 [ 38.066877] ? find_held_lock+0x40/0xb0 [ 38.071499] ? __fget+0x102/0x210 [ 38.075647] ? __fget+0x121/0x210 [ 38.079780] ? __fget+0x5/0x210 [ 38.083706] ? __fget_light+0x25/0x70 [ 38.088208] __sys_sendmsg+0x54/0x90 [ 38.092606] SyS_sendmsg+0x12/0x20 [ 38.096810] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 38.102379] RIP: 0033:0x7f146e486974 [ 38.106778] RSP: 002b:00007ffd0cd3ee00 EFLAGS: 00000293 ORIG_RAX: 0000000000e [ 38.115654] RAX: ffffffffffffffda RBX: 000055698f9641f9 RCX: 00007f146e486974 [ 38.124058] RDX: 0000000000000000 RSI: 00007ffd0cd3ee50 RDI: 0000000000000007 [ 38.132474] RBP: 00007ffd0cd3f2e0 R08: 0000000000000000 R09: 000055699118c300 [ 38.140884] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000001 [ 38.149306] R13: 0000000000000001 R14: 00007ffd0cd3f010 R15: 000055698fbda5c0 [ 38.160359] ib_srpt srpt_add_one(cxgb4_0) failed. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part