Re: [RFC PATCH 0/4] NFS: Fix another 'check_flush_dependency' splat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jun 2, 2024, at 2:14 PM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
> 
> On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
>>> 
>>> On 30.04.24 16:13, Chuck Lever III wrote:
>>>> It is possible to add rxe as a second option in kdevops,
>>>> but siw has worked for our purposes so far, and the NFS
>>>> test matrix is already enormous.
>>> 
>>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.
>> 
>> As per our recent discussion, I have added rxe as a second
>> software RDMA option in kdevops. Proof of concept:
> 
> Thanks a lot. I am very glad to know that rxe is treated as a second
> software RDMA option in kdeops.
> And I also checked the commit related with this feature. It is very
> complicated and huge.

I split this into four smaller patches, HTH.


> I hope rxe can work well in kdeops.
> So I can also use kdeops to verify rxe and rdma subsystems.  Thanks a
> lot your efforts.
> 
>> 
>>  https://github.com/chucklever/kdevops/tree/add-rxe-support
>> 
>> But basic rping testing is not working (with 6.10-rc1 kernels)
>> in this set-up. It's missing something...
> 
> Just now I made tests with the latest rdma-core (rping is included in
> rdma-core) and 6.10-rc1 kernels. rping can work well.
> 
> Normally rping works as a basic tool to verify if rxe works well or
> not.  If rping can not work well, normally I will do the followings:
> 1. rping -s -a 127.0.0.1
>    rping -c -a 127.0.0.1 -C 3 -d -v
>    This will verify whether rxe is configured correctly or not.

I don't have rxe set up on loopback, so I substituted the host's
configured Ethernet IP.

The tests works on the NFS server, but the rping client hangs
on the NFS client (both running v6.10-rc1).

I rebooted in to the Fedora 39 stock kernel, and the rping tests
pass.

However, when I try to run fstests with NFS/RDMA using rxe, the
client kernel reports a soft CPU lock-up, and top shows this:

    115 root      20   0       0      0      0 R  99.3   0.0   1:03.50 kworker/u8:5+rxe_wq

So I think this is enough to show that the Ansible parts of this
change are working as expected. I can push this to kdevops now
if there are no objections, and someone (maybe you, maybe me) can
sort out the rxe specific issues later.


> 2. ping -c 3 server_ip on client host.
>    This will verify whether the client host can connect to the server
> host or not.
> 3. rping -s -a server_ip
>    rping -c -a server_ip -C 3 -d -v
>    1) shutdown firewall
>    2) tcpdump -ni xxxx to capture udp packets
> Normally the above steps can find out the errors in rxe client/server.
> Hope the above can help to find out the errors.
> 
> Zhu Yanjun
> 
>> 
>> --
>> Chuck Lever
>> 
>> 

--
Chuck Lever






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux