> On Jun 2, 2024, at 2:14 PM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > > On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >> >> >>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: >>> >>> On 30.04.24 16:13, Chuck Lever III wrote: >>>> It is possible to add rxe as a second option in kdevops, >>>> but siw has worked for our purposes so far, and the NFS >>>> test matrix is already enormous. >>> >>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version. >> >> As per our recent discussion, I have added rxe as a second >> software RDMA option in kdevops. Proof of concept: > > Thanks a lot. I am very glad to know that rxe is treated as a second > software RDMA option in kdeops. > And I also checked the commit related with this feature. It is very > complicated and huge. I split this into four smaller patches, HTH. > I hope rxe can work well in kdeops. > So I can also use kdeops to verify rxe and rdma subsystems. Thanks a > lot your efforts. > >> >> https://github.com/chucklever/kdevops/tree/add-rxe-support >> >> But basic rping testing is not working (with 6.10-rc1 kernels) >> in this set-up. It's missing something... > > Just now I made tests with the latest rdma-core (rping is included in > rdma-core) and 6.10-rc1 kernels. rping can work well. > > Normally rping works as a basic tool to verify if rxe works well or > not. If rping can not work well, normally I will do the followings: > 1. rping -s -a 127.0.0.1 > rping -c -a 127.0.0.1 -C 3 -d -v > This will verify whether rxe is configured correctly or not. I don't have rxe set up on loopback, so I substituted the host's configured Ethernet IP. The tests works on the NFS server, but the rping client hangs on the NFS client (both running v6.10-rc1). I rebooted in to the Fedora 39 stock kernel, and the rping tests pass. However, when I try to run fstests with NFS/RDMA using rxe, the client kernel reports a soft CPU lock-up, and top shows this: 115 root 20 0 0 0 0 R 99.3 0.0 1:03.50 kworker/u8:5+rxe_wq So I think this is enough to show that the Ansible parts of this change are working as expected. I can push this to kdevops now if there are no objections, and someone (maybe you, maybe me) can sort out the rxe specific issues later. > 2. ping -c 3 server_ip on client host. > This will verify whether the client host can connect to the server > host or not. > 3. rping -s -a server_ip > rping -c -a server_ip -C 3 -d -v > 1) shutdown firewall > 2) tcpdump -ni xxxx to capture udp packets > Normally the above steps can find out the errors in rxe client/server. > Hope the above can help to find out the errors. > > Zhu Yanjun > >> >> -- >> Chuck Lever >> >> -- Chuck Lever