Re: [Bug 218743] New: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 19 Apr 2024 15:19:53 +0000

> On Apr 19, 2024, at 11:15 AM, Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx> wrote:
> 
> I forward networking bugs to the maintainers.
> Netdev does not use bugzilla, not sure if NFS does.
> 
> Begin forwarded message:
> 
> Date: Thu, 18 Apr 2024 00:00:22 +0000
> From: bugzilla-daemon@xxxxxxxxxx
> To: stephen@xxxxxxxxxxxxxxxxxx
> Subject: [Bug 218743] New: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=218743
> 
>            Bug ID: 218743
>           Summary: NFS-RDMA-Connected Regression Found on Upstream Linux
>                    6.9-rc1
>           Product: Networking
>           Version: 2.5
>    Kernel Version: 6.9-rc1
>          Hardware: Intel
>                OS: Linux
>            Status: NEW
>          Severity: high
>          Priority: P3
>         Component: Other
>          Assignee: stephen@xxxxxxxxxxxxxxxxxx
>          Reporter: manuel.gomez@xxxxxxxxxxxxxxxxxxxx
>                CC: dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx
>        Regression: Yes
>           Bisected e084ee673c77cade06ab4c2e36b5624c82608b8c
>         commit-id:
> 
> On the Linux 6.9-rc1 kernel there is a performance regression for NFS file
> transfers when Connected IPoIB mode is enabled. The network switch is OPA
> (Omnipath Architecture).
> 
> The most recent good commit in my bisection was the v6.8 mainline kernel
> (e8f897f4afef0031fe618a8e94127a0934896aba). Bisecting from v6.8 to v6.9-rc1
> showed me that "[e084ee673c77cade06ab4c2e36b5624c82608b8c] svcrdma: Add Write
> chunk WRs to the RPC's Send WR chain" was indeed the culprit of the regression.
> 
> 
> Here are the steps I ran to reproduce the issue:
> 1. Establish IPoIB Connected Mode on both client and host nodes:
> "echo connected > /sys/class/net/ibs785/mode"
> 
> 
> 2. Start an NFS server on the host node:
> "systemctl start opafm
> sleep 10
> systemctl start nfs-server
> modprobe svcrdma
> echo "rdma 20049" > /proc/fs/nfsd/portlist
> mkdir -p /mnt/nfs_test
> mount -t tmpfs -o size=4096M tmpfs /mnt/nfs_test
> sudo exportfs -o fsid=0,rw,async,insecure,no_root_squash
> 192.168.2.0/255.255.255.0:/mnt/nfs_test_testrun/"
> 
> 
> 3. Ready the client node:
> "mkdir -p /mnt/nfs_test
> mount -o rdma,port=20049 192.168.2.1:/mnt/nfs_test_testrun
> /mnt/nfs_test_testrun/"
> 
> 
> 4. Run the actual test from the client node:
> "
> #!/bin/bash
> 
> fsize=268435456
> jfile=/dev/shm/run_nfs_test2.junk
> tfile=/dev/shm/run_nfs_test2.tmp
> nfsfile=/mnt/nfs_test_testrun/run_nfs_test2.junk
> rm -r -f /mnt/nfs_test_testrun/
> rm -f ${tfile}
> rm -f ${jfile}
> 
> dd if=/dev/urandom iflag=fullblock of=${jfile} bs=1024 count=$((fsize/1024));
> 
> for i in {1..100}; do
>          cp ${jfile} ${nfsfile}; # Bottleneck 1
> 
>          cp ${nfsfile} ${tfile}; # Bottleneck 2
> 
>         cmp ${jfile} ${tfile};
> 
>          rm -f ${tfile};
>        echo "DONE with iter ${i}"
> done;
> 
> rm -f ${jfile};
> rm -f ${tfile};
> echo "Done";
> "
> 
> 
> On v6.8 I was seeing this test taking about 1m50s to complete, for all 10
> iterations. On v6.9-rc1 it takes 3-7 minutes, and I also see these kernel
> traces printed continuously in dmesg during this regression:
> 
> [23720.243905] INFO: task kworker/61:1:556 blocked for more than 122 seconds.
> [23720.251709]       Not tainted 6.9.0-rc1 #1
> [23720.256387] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> [23720.265268] task:kworker/61:1    state:D stack:0     pid:556   tgid:556  
> ppid:2      flags:0x00004000
> [23720.275822] Workqueue: events __svc_rdma_free [rpcrdma]
> [23720.281803] Call Trace:
> [23720.284630]  <TASK>
> [23720.287067]  __schedule+0x210/0x660
> [23720.291063]  schedule+0x2c/0xb0
> [23720.294668]  schedule_timeout+0x146/0x160
> [23720.299249]  __wait_for_common+0x92/0x1d0
> [23720.303828]  ? __pfx_schedule_timeout+0x10/0x10
> [23720.308987]  __ib_drain_sq+0xfa/0x170 [ib_core]
> [23720.314190]  ? __pfx_ib_drain_qp_done+0x10/0x10 [ib_core]
> [23720.320343]  ib_drain_qp+0x71/0x80 [ib_core]
> [23720.325232]  __svc_rdma_free+0x28/0x100 [rpcrdma]
> [23720.330604]  process_one_work+0x196/0x3d0
> [23720.335185]  worker_thread+0x2fc/0x410
> [23720.339470]  ? __pfx_worker_thread+0x10/0x10
> [23720.344336]  kthread+0xdf/0x110
> [23720.347941]  ? __pfx_kthread+0x10/0x10
> [23720.352225]  ret_from_fork+0x30/0x50
> [23720.356317]  ? __pfx_kthread+0x10/0x10
> [23720.360602]  ret_from_fork_asm+0x1a/0x30
> [23720.365083]  </TASK>
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are the assignee for the bug.
> 

Thanks, I've seen a performance regression on one system and
haven't been able to reproduce it elsewhere.

Please move this bug to Filesystems/NFS.

--
Chuck Lever