On Fri, Jan 3, 2025 at 4:05 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Tue, Dec 24, 2024 at 08:52:24AM +0000, Daisuke Matsuda (Fujitsu) wrote: > > On Mon, Dec 23, 2024 10:55 AM Daisuke Matsuda (Fujitsu) <matsuda-daisuke@xxxxxxxxxxx> wrote: > > > On Mon, Dec 23, 2024 2:25 AM Joe Klein <joe.klein812@xxxxxxxxx> wrote: > > > > We have tested this patcheset and had a lot of problems, even without using the ODP option in softroce. I don't know if > > > others have done similar tests. If we have to merge this patchset into upstream, is it > possible to add a kernel option to > > > enable/disable this patchset? > > > > > > Hi Joe, > > > > > > Can you clarify the test and the problems you observed? > > > I wonder if you tried the test with the latest tree WITHOUT my patches. > > > > > > As far as I know, there is something wrong with the upstream right now. > > > It does not complete the rdma-core testcases, and 'segmentation fault' is observed > > > in the middle of the full test run, which did not happen before October 2024. > > > > It appears that the root cause of this issue lies within the userspace components. > > My report yesterday was based on experiments conducted on Ubuntu 24.04.1 LTS (x86_64). > > It seems to me that rxe is somehow broken regardless of kernel version > > as long as userspace components are provided by Ubuntu 24.04.1 LTS. > > I built and tried linux-6.11, linux-6.10, and linux-6.8, and they all failed as I reported. > > > > I switched to Ubuntu 22.04.5 LTS (aarch64) to test with the older libraries. > > All tests available passed using the rdma for-next tree without any problem. > > Then, I applied my ODP patches onto it, and everything is still fine. > > #################### > > ubuntu@rdma-aarch64:~/rdma-core$ git branch -v > > * master fb965e2d0 Merge pull request #1531 from selvintxavier/pbuf_optimization > > ubuntu@rdma-aarch64:~/rdma-core$ ./build/bin/run_tests.py > > ..........ss..........ssssssssss..............ssssssssssssssssssssssssss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss........ssssss..ss....s.sssssss....ss....ss..............s......................ss.............sss...ssss > > ---------------------------------------------------------------------- > > Ran 321 tests in 3.599s > > > > OK (skipped=211) > > ubuntu@rdma-aarch64:~/rdma-core$ ./build/bin/run_tests.py -k odp > > sssssssss..ss....s.s > > ---------------------------------------------------------------------- > > Ran 20 tests in 0.269s > > > > OK (skipped=13) > > #################### > > > > Possibly, there was a regression in libibverbs between v39.0-1 and v50.0-2build2. > > We need to take a closer look to resolve the malfunction of rxe on Ubuntu 24.04. > > That's distressing. > > > In conclusion, I believe there is nothing in my ODP patches that could cause > > the rxe driver to fail. I would appreciate any feedback on potential improvements. > > What am I supposed to do with this though? > > Joe, can you please answer Daisuke's questions on what problems you > observed and if you observe them without the ODP patches? Will make tests and let you know the result very soon. > > Jason