Re: NFS trace new to 5.13.0 (GA)

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 29 Jun 2021 18:38:22 +0000

Hi Mike-

> On Jun 29, 2021, at 2:28 PM, Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> During our continuous integration testing on 5.13.0 kernel our testing trips on NFS testing with the following trace on the client:
> 
> [32936.156848] INFO: task kworker/9:1:519 blocked for more than 122 seconds.
> [32936.165201]       Tainted: G S                5.13.0 #1
> [32936.171562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [32936.180773] task:kworker/9:1     state:D stack:    0 pid:  519 ppid:     2 flags:0x00004000
> [32936.190565] Workqueue: events xprt_destroy_cb [sunrpc]
> [32936.196854] Call Trace:
> [32936.200107]  __schedule+0x38e/0x8b0
> [32936.204482]  schedule+0x3c/0xa0
> [32936.208464]  schedule_timeout+0x215/0x2b0
> [32936.213401]  ? check_preempt_curr+0x3f/0x70
> [32936.218518]  ? ttwu_do_wakeup+0x17/0x140
> [32936.223336]  wait_for_completion+0x98/0xf0
> [32936.228396]  __flush_work+0x128/0x1e0
> [32936.232942]  ? worker_attach_to_pool+0xb0/0xb0
> [32936.238351]  ? work_busy+0x80/0x80
> [32936.242555]  __cancel_work_timer+0x110/0x1a0
> [32936.247726]  ? xprt_rdma_bc_destroy+0xc6/0xe0 [rpcrdma]
> [32936.254034]  xprt_rdma_destroy+0x15/0x50 [rpcrdma]
> [32936.259873]  process_one_work+0x1cb/0x360
> [32936.264788]  ? process_one_work+0x360/0x360
> [32936.269915]  worker_thread+0x30/0x370
> [32936.274436]  ? process_one_work+0x360/0x360
> [32936.279526]  kthread+0x116/0x130
> [32936.283534]  ? __kthread_cancel_work+0x40/0x40
> [32936.288924]  ret_from_fork+0x22/0x30
> 
> The same tests and same servers see no such issue from rc4 to rc7, so the failure seems new.
> 
> Any thoughts?
> 
> I'm currently rerunning rc7 just to be sure.

The NFS server in v5.13 is afflicted by a late-breaking bug-fix
to the alloc_pages_bulk_array() API. It's been fixed in Linus'
tree, but that tree is otherwise unstable for me.

Have a look at commit 66d9282523b for a one-liner fix, it should
apply cleanly to v5.13.

--
Chuck Lever