Hi Mike- > On Jun 29, 2021, at 2:28 PM, Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx> wrote: > > During our continuous integration testing on 5.13.0 kernel our testing trips on NFS testing with the following trace on the client: > > [32936.156848] INFO: task kworker/9:1:519 blocked for more than 122 seconds. > [32936.165201] Tainted: G S 5.13.0 #1 > [32936.171562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [32936.180773] task:kworker/9:1 state:D stack: 0 pid: 519 ppid: 2 flags:0x00004000 > [32936.190565] Workqueue: events xprt_destroy_cb [sunrpc] > [32936.196854] Call Trace: > [32936.200107] __schedule+0x38e/0x8b0 > [32936.204482] schedule+0x3c/0xa0 > [32936.208464] schedule_timeout+0x215/0x2b0 > [32936.213401] ? check_preempt_curr+0x3f/0x70 > [32936.218518] ? ttwu_do_wakeup+0x17/0x140 > [32936.223336] wait_for_completion+0x98/0xf0 > [32936.228396] __flush_work+0x128/0x1e0 > [32936.232942] ? worker_attach_to_pool+0xb0/0xb0 > [32936.238351] ? work_busy+0x80/0x80 > [32936.242555] __cancel_work_timer+0x110/0x1a0 > [32936.247726] ? xprt_rdma_bc_destroy+0xc6/0xe0 [rpcrdma] > [32936.254034] xprt_rdma_destroy+0x15/0x50 [rpcrdma] > [32936.259873] process_one_work+0x1cb/0x360 > [32936.264788] ? process_one_work+0x360/0x360 > [32936.269915] worker_thread+0x30/0x370 > [32936.274436] ? process_one_work+0x360/0x360 > [32936.279526] kthread+0x116/0x130 > [32936.283534] ? __kthread_cancel_work+0x40/0x40 > [32936.288924] ret_from_fork+0x22/0x30 > > The same tests and same servers see no such issue from rc4 to rc7, so the failure seems new. > > Any thoughts? > > I'm currently rerunning rc7 just to be sure. The NFS server in v5.13 is afflicted by a late-breaking bug-fix to the alloc_pages_bulk_array() API. It's been fixed in Linus' tree, but that tree is otherwise unstable for me. Have a look at commit 66d9282523b for a one-liner fix, it should apply cleanly to v5.13. -- Chuck Lever