I forward networking bugs to the maintainers. Netdev does not use bugzilla, not sure if NFS does. Begin forwarded message: Date: Thu, 18 Apr 2024 00:00:22 +0000 From: bugzilla-daemon@xxxxxxxxxx To: stephen@xxxxxxxxxxxxxxxxxx Subject: [Bug 218743] New: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1 https://bugzilla.kernel.org/show_bug.cgi?id=218743 Bug ID: 218743 Summary: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1 Product: Networking Version: 2.5 Kernel Version: 6.9-rc1 Hardware: Intel OS: Linux Status: NEW Severity: high Priority: P3 Component: Other Assignee: stephen@xxxxxxxxxxxxxxxxxx Reporter: manuel.gomez@xxxxxxxxxxxxxxxxxxxx CC: dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx Regression: Yes Bisected e084ee673c77cade06ab4c2e36b5624c82608b8c commit-id: On the Linux 6.9-rc1 kernel there is a performance regression for NFS file transfers when Connected IPoIB mode is enabled. The network switch is OPA (Omnipath Architecture). The most recent good commit in my bisection was the v6.8 mainline kernel (e8f897f4afef0031fe618a8e94127a0934896aba). Bisecting from v6.8 to v6.9-rc1 showed me that "[e084ee673c77cade06ab4c2e36b5624c82608b8c] svcrdma: Add Write chunk WRs to the RPC's Send WR chain" was indeed the culprit of the regression. Here are the steps I ran to reproduce the issue: 1. Establish IPoIB Connected Mode on both client and host nodes: "echo connected > /sys/class/net/ibs785/mode" 2. Start an NFS server on the host node: "systemctl start opafm sleep 10 systemctl start nfs-server modprobe svcrdma echo "rdma 20049" > /proc/fs/nfsd/portlist mkdir -p /mnt/nfs_test mount -t tmpfs -o size=4096M tmpfs /mnt/nfs_test sudo exportfs -o fsid=0,rw,async,insecure,no_root_squash 192.168.2.0/255.255.255.0:/mnt/nfs_test_testrun/" 3. Ready the client node: "mkdir -p /mnt/nfs_test mount -o rdma,port=20049 192.168.2.1:/mnt/nfs_test_testrun /mnt/nfs_test_testrun/" 4. Run the actual test from the client node: " #!/bin/bash fsize=268435456 jfile=/dev/shm/run_nfs_test2.junk tfile=/dev/shm/run_nfs_test2.tmp nfsfile=/mnt/nfs_test_testrun/run_nfs_test2.junk rm -r -f /mnt/nfs_test_testrun/ rm -f ${tfile} rm -f ${jfile} dd if=/dev/urandom iflag=fullblock of=${jfile} bs=1024 count=$((fsize/1024)); for i in {1..100}; do cp ${jfile} ${nfsfile}; # Bottleneck 1 cp ${nfsfile} ${tfile}; # Bottleneck 2 cmp ${jfile} ${tfile}; rm -f ${tfile}; echo "DONE with iter ${i}" done; rm -f ${jfile}; rm -f ${tfile}; echo "Done"; " On v6.8 I was seeing this test taking about 1m50s to complete, for all 10 iterations. On v6.9-rc1 it takes 3-7 minutes, and I also see these kernel traces printed continuously in dmesg during this regression: [23720.243905] INFO: task kworker/61:1:556 blocked for more than 122 seconds. [23720.251709] Not tainted 6.9.0-rc1 #1 [23720.256387] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23720.265268] task:kworker/61:1 state:D stack:0 pid:556 tgid:556 ppid:2 flags:0x00004000 [23720.275822] Workqueue: events __svc_rdma_free [rpcrdma] [23720.281803] Call Trace: [23720.284630] <TASK> [23720.287067] __schedule+0x210/0x660 [23720.291063] schedule+0x2c/0xb0 [23720.294668] schedule_timeout+0x146/0x160 [23720.299249] __wait_for_common+0x92/0x1d0 [23720.303828] ? __pfx_schedule_timeout+0x10/0x10 [23720.308987] __ib_drain_sq+0xfa/0x170 [ib_core] [23720.314190] ? __pfx_ib_drain_qp_done+0x10/0x10 [ib_core] [23720.320343] ib_drain_qp+0x71/0x80 [ib_core] [23720.325232] __svc_rdma_free+0x28/0x100 [rpcrdma] [23720.330604] process_one_work+0x196/0x3d0 [23720.335185] worker_thread+0x2fc/0x410 [23720.339470] ? __pfx_worker_thread+0x10/0x10 [23720.344336] kthread+0xdf/0x110 [23720.347941] ? __pfx_kthread+0x10/0x10 [23720.352225] ret_from_fork+0x30/0x50 [23720.356317] ? __pfx_kthread+0x10/0x10 [23720.360602] ret_from_fork_asm+0x1a/0x30 [23720.365083] </TASK> -- You may reply to this email to add a comment. You are receiving this mail because: You are the assignee for the bug.