Re: NFS regression between 5.17 and 5.18

Dennis Dalessandro <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> · Fri, 13 May 2022 07:58:30 -0400

On 5/6/22 9:24 AM, Dennis Dalessandro wrote:
> On 4/29/22 9:37 AM, Chuck Lever III wrote:
>>
>>
>>> On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> On 4/28/22 3:56 PM, Trond Myklebust wrote:
>>>> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
>>>>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>>>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>>>>> <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Hi NFS folks,
>>>>>>>>
>>>>>>>> I've noticed a pretty nasty regression in our NFS capability
>>>>>>>> between 5.17 and
>>>>>>>> 5.18-rc1. I've tried to bisect but not having any luck. The
>>>>>>>> problem I'm seeing
>>>>>>>> is it takes 3 minutes to copy a file from NFS to the local
>>>>>>>> disk. When it should
>>>>>>>> take less than half a second, which it did up through 5.17.
>>>>>>>>
>>>>>>>> It doesn't seem to be network related, but can't rule that out
>>>>>>>> completely.
>>>>>>>>
>>>>>>>> I tried to bisect but the problem can be intermittent. Some
>>>>>>>> runs I'll see a
>>>>>>>> problem in 3 out of 100 cycles, sometimes 0 out of 100.
>>>>>>>> Sometimes I'll see it
>>>>>>>> 100 out of 100.
>>>>>>>
>>>>>>> It's not clear from your problem report whether the problem
>>>>>>> appears
>>>>>>> when it's the server running v5.18-rc or the client.
>>>>>>
>>>>>> That's because I don't know which it is. I'll do a quick test and
>>>>>> find out. I
>>>>>> was testing the same kernel across both nodes.
>>>>>
>>>>> Looks like it is the client.
>>>>>
>>>>> server  client  result
>>>>> ------  ------  ------
>>>>> 5.17    5.17    Pass
>>>>> 5.17    5.18    Fail
>>>>> 5.18    5.18    Fail
>>>>> 5.18    5.17    Pass
>>>>>
>>>>> Is there a patch for the client issue you mentioned that I could try?
>>>>>
>>>>> -Denny
>>>>
>>>> Try this one
>>>
>>> Thanks for the patch. Unfortunately it doesn't seem to solve the issue, still
>>> see intermittent hangs. I applied it on top of -rc4:
>>>
>>> copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...
>>>
>>> real    2m6.072s
>>> user    0m0.002s
>>> sys     0m0.263s
>>> Done
>>>
>>> While it was hung I checked the mem usage on the machine:
>>>
>>> # free -h
>>>              total        used        free      shared  buff/cache   available
>>> Mem:           62Gi       871Mi        61Gi       342Mi       889Mi        61Gi
>>> Swap:         4.0Gi          0B       4.0Gi
>>>
>>> Doesn't appear to be under memory pressure.
>>
>> Hi, since you know now that it is the client, perhaps a bisect
>> would be more successful?
> 
> I've been testing all week. I pulled the nfs-rdma tree that was sent to Linus
> for 5.18 and tested. I see the problem on pretty much all the patches. However
> it's the frequency that it hits which changes.
> 
> I'll see 1-5 cycles out of 2500 where the copy takes minutes up to:
> "NFS: Convert readdir page cache to use a cookie based index"
> 
> After this I start seeing it around 10 times in 500 and by the last patch 10
> times in less than 100.
> 
> Is there any kind of tracing/debugging I could turn on to get more insight on
> what is taking so long when it does go bad?
> 

Ran a test with -rc6 and this time see a hung task trace on the console as well
as an NFS RPC error.

[32719.991175] nfs: RPC call returned error 512
.
.
.
[32933.285126] INFO: task kworker/u145:23:886141 blocked for more than 122 seconds.
[32933.293543]       Tainted: G S                5.18.0-rc6 #1
[32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141 ppid:     2
flags:0x00004000
[32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
[32933.324524] Call Trace:
[32933.327347]  <TASK>
[32933.329785]  __schedule+0x3dd/0x970
[32933.333783]  schedule+0x41/0xa0
[32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
[32933.343639]  ? prepare_to_wait+0xd0/0xd0
[32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
[32933.354183]  xprt_release+0x26/0x140 [sunrpc]
[32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
[32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
[32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
[32933.376564]  ? __switch_to_asm+0x42/0x70
[32933.381046]  ? finish_task_switch+0xb2/0x2c0
[32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
[32933.391391]  process_one_work+0x1c8/0x390
[32933.395975]  worker_thread+0x30/0x360
[32933.400162]  ? process_one_work+0x390/0x390
[32933.404931]  kthread+0xd9/0x100
[32933.408536]  ? kthread_complete_and_exit+0x20/0x20
[32933.413984]  ret_from_fork+0x22/0x30
[32933.418074]  </TASK>

The call trace shows up again at 245, 368, and 491 seconds. Same task, same trace.