[PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



While setting up a new lab, I accidentally misconfigured the
Ethernet port for a system that tried an NFS mount using RoCE.
This made the NFS server unreachable. The following WARNING
popped on the NFS client while waiting for the mount attempt to
time out:

Aug 20 17:12:05 bazille kernel: workqueue: WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma] is flushing !WQ_MEM_RECLAI>
Aug 20 17:12:05 bazille kernel: WARNING: CPU: 0 PID: 100 at kernel/workqueue.c:2628 check_flush_dependency+0xbf/0xca
Aug 20 17:12:05 bazille kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs 8021q garp stp mrp llc rfkill rpcrdma>
Aug 20 17:12:05 bazille kernel: CPU: 0 PID: 100 Comm: kworker/u8:8 Not tainted 6.0.0-rc1-00002-g6229f8c054e5 #13
Aug 20 17:12:05 bazille kernel: Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017
Aug 20 17:12:05 bazille kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
Aug 20 17:12:05 bazille kernel: RIP: 0010:check_flush_dependency+0xbf/0xca
Aug 20 17:12:05 bazille kernel: Code: 75 2a 48 8b 55 18 48 8d 8b b0 00 00 00 4d 89 e0 48 81 c6 b0 00 00 00 48 c7 c7 65 33 2e be>
Aug 20 17:12:05 bazille kernel: RSP: 0018:ffffb562806cfcf8 EFLAGS: 00010092
Aug 20 17:12:05 bazille kernel: RAX: 0000000000000082 RBX: ffff97894f8c3c00 RCX: 0000000000000027
Aug 20 17:12:05 bazille kernel: RDX: 0000000000000002 RSI: ffffffffbe3447d1 RDI: 00000000ffffffff
Aug 20 17:12:05 bazille kernel: RBP: ffff978941315840 R08: 0000000000000000 R09: 0000000000000000
Aug 20 17:12:05 bazille kernel: R10: 00000000000008b0 R11: 0000000000000001 R12: ffffffffc0ce3731
Aug 20 17:12:05 bazille kernel: R13: ffff978950c00500 R14: ffff97894341f0c0 R15: ffff978951112eb0
Aug 20 17:12:05 bazille kernel: FS:  0000000000000000(0000) GS:ffff97987fc00000(0000) knlGS:0000000000000000
Aug 20 17:12:05 bazille kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 20 17:12:05 bazille kernel: CR2: 00007f807535eae8 CR3: 000000010b8e4002 CR4: 00000000003706f0
Aug 20 17:12:05 bazille kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 20 17:12:05 bazille kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 20 17:12:05 bazille kernel: Call Trace:
Aug 20 17:12:05 bazille kernel:  <TASK>
Aug 20 17:12:05 bazille kernel:  __flush_work.isra.0+0xaf/0x188
Aug 20 17:12:05 bazille kernel:  ? _raw_spin_lock_irqsave+0x2c/0x37
Aug 20 17:12:05 bazille kernel:  ? lock_timer_base+0x38/0x5f
Aug 20 17:12:05 bazille kernel:  __cancel_work_timer+0xea/0x13d
Aug 20 17:12:05 bazille kernel:  ? preempt_latency_start+0x2b/0x46
Aug 20 17:12:05 bazille kernel:  rdma_addr_cancel+0x70/0x81 [ib_core]
Aug 20 17:12:05 bazille kernel:  _destroy_id+0x1a/0x246 [rdma_cm]
Aug 20 17:12:05 bazille kernel:  rpcrdma_xprt_connect+0x115/0x5ae [rpcrdma]
Aug 20 17:12:05 bazille kernel:  ? _raw_spin_unlock+0x14/0x29
Aug 20 17:12:05 bazille kernel:  ? raw_spin_rq_unlock_irq+0x5/0x10
Aug 20 17:12:05 bazille kernel:  ? finish_task_switch.isra.0+0x171/0x249
Aug 20 17:12:05 bazille kernel:  xprt_rdma_connect_worker+0x3b/0xc7 [rpcrdma]
Aug 20 17:12:05 bazille kernel:  process_one_work+0x1d8/0x2d4
Aug 20 17:12:05 bazille kernel:  worker_thread+0x18b/0x24f
Aug 20 17:12:05 bazille kernel:  ? rescuer_thread+0x280/0x280
Aug 20 17:12:05 bazille kernel:  kthread+0xf4/0xfc
Aug 20 17:12:05 bazille kernel:  ? kthread_complete_and_exit+0x1b/0x1b
Aug 20 17:12:05 bazille kernel:  ret_from_fork+0x22/0x30
Aug 20 17:12:05 bazille kernel:  </TASK>

The xprtiod work queue is WQ_MEM_RECLAIM, so any work queue that
one of its work items tries to cancel has to be WQ_MEM_RECLAIM to
prevent a priority inversion.

Suggested-by: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx>
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
---
 drivers/infiniband/core/addr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f253295795f0..5c36d01ebf0b 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -872,7 +872,7 @@ static struct notifier_block nb = {
 
 int addr_init(void)
 {
-	addr_wq = alloc_ordered_workqueue("ib_addr", 0);
+	addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM);
 	if (!addr_wq)
 		return -ENOMEM;
 





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux