On Tue, Dec 24, 2024 at 08:52:24AM +0000, Daisuke Matsuda (Fujitsu) wrote: > On Mon, Dec 23, 2024 10:55 AM Daisuke Matsuda (Fujitsu) <matsuda-daisuke@xxxxxxxxxxx> wrote: > > On Mon, Dec 23, 2024 2:25 AM Joe Klein <joe.klein812@xxxxxxxxx> wrote: > > > We have tested this patcheset and had a lot of problems, even without using the ODP option in softroce. I don't know if > > others have done similar tests. If we have to merge this patchset into upstream, is it > possible to add a kernel option to > > enable/disable this patchset? > > > > Hi Joe, > > > > Can you clarify the test and the problems you observed? > > I wonder if you tried the test with the latest tree WITHOUT my patches. > > > > As far as I know, there is something wrong with the upstream right now. > > It does not complete the rdma-core testcases, and 'segmentation fault' is observed > > in the middle of the full test run, which did not happen before October 2024. > > It appears that the root cause of this issue lies within the userspace components. > My report yesterday was based on experiments conducted on Ubuntu 24.04.1 LTS (x86_64). > It seems to me that rxe is somehow broken regardless of kernel version > as long as userspace components are provided by Ubuntu 24.04.1 LTS. > I built and tried linux-6.11, linux-6.10, and linux-6.8, and they all failed as I reported. > > I switched to Ubuntu 22.04.5 LTS (aarch64) to test with the older libraries. > All tests available passed using the rdma for-next tree without any problem. > Then, I applied my ODP patches onto it, and everything is still fine. > #################### > ubuntu@rdma-aarch64:~/rdma-core$ git branch -v > * master fb965e2d0 Merge pull request #1531 from selvintxavier/pbuf_optimization > ubuntu@rdma-aarch64:~/rdma-core$ ./build/bin/run_tests.py > ..........ss..........ssssssssss..............ssssssssssssssssssssssssss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss........ssssss..ss....s.sssssss....ss....ss..............s......................ss.............sss...ssss > ---------------------------------------------------------------------- > Ran 321 tests in 3.599s > > OK (skipped=211) > ubuntu@rdma-aarch64:~/rdma-core$ ./build/bin/run_tests.py -k odp > sssssssss..ss....s.s > ---------------------------------------------------------------------- > Ran 20 tests in 0.269s > > OK (skipped=13) > #################### > > Possibly, there was a regression in libibverbs between v39.0-1 and v50.0-2build2. > We need to take a closer look to resolve the malfunction of rxe on Ubuntu 24.04. That's distressing. > In conclusion, I believe there is nothing in my ODP patches that could cause > the rxe driver to fail. I would appreciate any feedback on potential improvements. What am I supposed to do with this though? Joe, can you please answer Daisuke's questions on what problems you observed and if you observe them without the ODP patches? Jason