> On Jun 3, 2024, at 12:54 PM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > > On Mon, Jun 3, 2024 at 5:59 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >> >> >> >>> On Jun 2, 2024, at 2:14 PM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: >>> >>> On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> >>>> >>>>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: >>>>> >>>>> On 30.04.24 16:13, Chuck Lever III wrote: >>>>>> It is possible to add rxe as a second option in kdevops, >>>>>> but siw has worked for our purposes so far, and the NFS >>>>>> test matrix is already enormous. >>>>> >>>>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version. >>>> >>>> As per our recent discussion, I have added rxe as a second >>>> software RDMA option in kdevops. Proof of concept: >>> >>> Thanks a lot. I am very glad to know that rxe is treated as a second >>> software RDMA option in kdeops. >>> And I also checked the commit related with this feature. It is very >>> complicated and huge. >> >> I split this into four smaller patches, HTH. >> >> >>> I hope rxe can work well in kdeops. >>> So I can also use kdeops to verify rxe and rdma subsystems. Thanks a >>> lot your efforts. >>> >>>> >>>> https://github.com/chucklever/kdevops/tree/add-rxe-support >>>> >>>> But basic rping testing is not working (with 6.10-rc1 kernels) >>>> in this set-up. It's missing something... >>> >>> Just now I made tests with the latest rdma-core (rping is included in >>> rdma-core) and 6.10-rc1 kernels. rping can work well. >>> >>> Normally rping works as a basic tool to verify if rxe works well or >>> not. If rping can not work well, normally I will do the followings: >>> 1. rping -s -a 127.0.0.1 >>> rping -c -a 127.0.0.1 -C 3 -d -v >>> This will verify whether rxe is configured correctly or not. >> >> I don't have rxe set up on loopback, so I substituted the host's >> configured Ethernet IP. >> >> The tests works on the NFS server, but the rping client hangs >> on the NFS client (both running v6.10-rc1). >> >> I rebooted in to the Fedora 39 stock kernel, and the rping tests >> pass. >> >> However, when I try to run fstests with NFS/RDMA using rxe, the >> client kernel reports a soft CPU lock-up, and top shows this: >> >> 115 root 20 0 0 0 0 R 99.3 0.0 1:03.50 kworker/u8:5+rxe_wq > > rxe_wq is introduced in the commit 9b4b7c1f9f54 "RDMA/rxe: Add > workqueue support for rxe tasks". > And this commit is merged into kernel v6.4-rc2-1-g9b4b7c1f9f54. > > And the Fedora 39 stock kernel is kernel 6.5. So maybe some commits > between 6.5 and 6.10 introduce this problem. I couldn't get 6.10-rc1 working at all. This failure occurred with the stock Fedora 39 kernel and fstests with NFS v4.2 on RDMA. >> So I think this is enough to show that the Ansible parts of this >> change are working as expected. I can push this to kdevops now >> if there are no objections, and someone (maybe you, maybe me) can >> sort out the rxe specific issues later. > > Thanks. After I can reproduce this problem in my local host, I am very > glad to delve into this problem. Perhaps it will take me a long time > since I do not have a good host to deploy kdevops. kdevops works on laptops too. The limiting factor seems to be memory for libvirt guests. Only two guests are needed for this test. > To be honest, perhaps "git bisec" can find the commit that introduce > this problem. If you can find the commit, we can fix this problem very > quickly^_^ Since this is the first time I've ever used rxe, I don't have a "good" commit to start from. > Thanks, > Zhu Yanjun > >> >> >>> 2. ping -c 3 server_ip on client host. >>> This will verify whether the client host can connect to the server >>> host or not. >>> 3. rping -s -a server_ip >>> rping -c -a server_ip -C 3 -d -v >>> 1) shutdown firewall >>> 2) tcpdump -ni xxxx to capture udp packets >>> Normally the above steps can find out the errors in rxe client/server. >>> Hope the above can help to find out the errors. >>> >>> Zhu Yanjun >>> >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >> >> -- >> Chuck Lever -- Chuck Lever