On Fri, 2022-05-13 at 07:58 -0400, Dennis Dalessandro wrote: > On 5/6/22 9:24 AM, Dennis Dalessandro wrote: > > On 4/29/22 9:37 AM, Chuck Lever III wrote: > > > > > > > > > > On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro > > > > <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On 4/28/22 3:56 PM, Trond Myklebust wrote: > > > > > On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote: > > > > > > On 4/28/22 11:42 AM, Dennis Dalessandro wrote: > > > > > > > On 4/28/22 10:57 AM, Chuck Lever III wrote: > > > > > > > > > On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro > > > > > > > > > <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > Hi NFS folks, > > > > > > > > > > > > > > > > > > I've noticed a pretty nasty regression in our NFS > > > > > > > > > capability > > > > > > > > > between 5.17 and > > > > > > > > > 5.18-rc1. I've tried to bisect but not having any > > > > > > > > > luck. The > > > > > > > > > problem I'm seeing > > > > > > > > > is it takes 3 minutes to copy a file from NFS to the > > > > > > > > > local > > > > > > > > > disk. When it should > > > > > > > > > take less than half a second, which it did up through > > > > > > > > > 5.17. > > > > > > > > > > > > > > > > > > It doesn't seem to be network related, but can't rule > > > > > > > > > that out > > > > > > > > > completely. > > > > > > > > > > > > > > > > > > I tried to bisect but the problem can be > > > > > > > > > intermittent. Some > > > > > > > > > runs I'll see a > > > > > > > > > problem in 3 out of 100 cycles, sometimes 0 out of > > > > > > > > > 100. > > > > > > > > > Sometimes I'll see it > > > > > > > > > 100 out of 100. > > > > > > > > > > > > > > > > It's not clear from your problem report whether the > > > > > > > > problem > > > > > > > > appears > > > > > > > > when it's the server running v5.18-rc or the client. > > > > > > > > > > > > > > That's because I don't know which it is. I'll do a quick > > > > > > > test and > > > > > > > find out. I > > > > > > > was testing the same kernel across both nodes. > > > > > > > > > > > > Looks like it is the client. > > > > > > > > > > > > server client result > > > > > > ------ ------ ------ > > > > > > 5.17 5.17 Pass > > > > > > 5.17 5.18 Fail > > > > > > 5.18 5.18 Fail > > > > > > 5.18 5.17 Pass > > > > > > > > > > > > Is there a patch for the client issue you mentioned that I > > > > > > could try? > > > > > > > > > > > > -Denny > > > > > > > > > > Try this one > > > > > > > > Thanks for the patch. Unfortunately it doesn't seem to solve > > > > the issue, still > > > > see intermittent hangs. I applied it on top of -rc4: > > > > > > > > copy /mnt/nfs_test/run_nfs_test.junk to > > > > /dev/shm/run_nfs_test.tmp... > > > > > > > > real 2m6.072s > > > > user 0m0.002s > > > > sys 0m0.263s > > > > Done > > > > > > > > While it was hung I checked the mem usage on the machine: > > > > > > > > # free -h > > > > total used free shared > > > > buff/cache available > > > > Mem: 62Gi 871Mi 61Gi 342Mi > > > > 889Mi 61Gi > > > > Swap: 4.0Gi 0B 4.0Gi > > > > > > > > Doesn't appear to be under memory pressure. > > > > > > Hi, since you know now that it is the client, perhaps a bisect > > > would be more successful? > > > > I've been testing all week. I pulled the nfs-rdma tree that was > > sent to Linus > > for 5.18 and tested. I see the problem on pretty much all the > > patches. However > > it's the frequency that it hits which changes. > > > > I'll see 1-5 cycles out of 2500 where the copy takes minutes up to: > > "NFS: Convert readdir page cache to use a cookie based index" > > > > After this I start seeing it around 10 times in 500 and by the last > > patch 10 > > times in less than 100. > > > > Is there any kind of tracing/debugging I could turn on to get more > > insight on > > what is taking so long when it does go bad? > > > > Ran a test with -rc6 and this time see a hung task trace on the > console as well > as an NFS RPC error. > > [32719.991175] nfs: RPC call returned error 512 > . > . > . > [32933.285126] INFO: task kworker/u145:23:886141 blocked for more > than 122 seconds. > [32933.293543] Tainted: G S 5.18.0-rc6 #1 > [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this > message. > [32933.308740] task:kworker/u145:23 state:D stack: 0 pid:886141 > ppid: 2 > flags:0x00004000 > [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc] > [32933.324524] Call Trace: > [32933.327347] <TASK> > [32933.329785] __schedule+0x3dd/0x970 > [32933.333783] schedule+0x41/0xa0 > [32933.337388] xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc] > [32933.343639] ? prepare_to_wait+0xd0/0xd0 > [32933.348123] ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc] > [32933.354183] xprt_release+0x26/0x140 [sunrpc] > [32933.359168] ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc] > [32933.365225] rpc_release_resources_task+0xe/0x50 [sunrpc] > [32933.371381] __rpc_execute+0x2c5/0x4e0 [sunrpc] > [32933.376564] ? __switch_to_asm+0x42/0x70 > [32933.381046] ? finish_task_switch+0xb2/0x2c0 > [32933.385918] rpc_async_schedule+0x29/0x40 [sunrpc] > [32933.391391] process_one_work+0x1c8/0x390 > [32933.395975] worker_thread+0x30/0x360 > [32933.400162] ? process_one_work+0x390/0x390 > [32933.404931] kthread+0xd9/0x100 > [32933.408536] ? kthread_complete_and_exit+0x20/0x20 > [32933.413984] ret_from_fork+0x22/0x30 > [32933.418074] </TASK> > > The call trace shows up again at 245, 368, and 491 seconds. Same > task, same trace. > > > > That's very helpful. The above trace suggests that the RDMA code is leaking a call to xprt_unpin_rqst(). >From what I can see, rpcrdma_reply_handler() will always call xprt_pin_rqst(), and that call is required to be matched by a call to xprt_unpin_rqst() before the handler exits. However those calls to xprt_unpin_rqst() appear to be scattered around the code, and it is not at all obvious that the requirement is being fulfilled. Chuck, why does that code need to be so complex? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx