On Tue, Jul 20, 2021 at 2:27 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > On 7/19/21 10:46 PM, Olga Kornievskaia wrote: > > Hello, > > > > I would like to report that the rxe driver got broken some time > > between 5.13 and 5.14-rc1 (so basically the last git pull). It's not > > just NFSoRDMA but simple rping doesn't work. I believe I found the > > problematic commit: 5bcf5a59c41e19141783c7305d420a5e36c937b2 > > "RDMA/rxe: Protext kernel index from user space" > > > > Server side logs: "rdma_rxe: bad ICRC from <>". > > > Thanks. That is helpful. Will try to find it. Thank you, I appreciate you looking into it. Actually I'm not 100% confident that's the commit for this particular problem "I" was seeing in 5.14-rc (which was rping hanging but not crashing. An NFS mount also hangs, doesn't crash) . But what git bisect was going thru and encountering crashes so can't say what it "found". So I think that's the one that cashes kernel oops. I think something else leads to the bad ICRC. I have a general question. I see that you've been posting a lot of work on RDMA/rxe lately. Can this be viewed as somebody (you/your company) is now actively supporting rxe driver? It looked like previously Mellanox had abandoned support for it. We ran into several issues trying to use rxe for NFSoRDMA throughout the years but they were not being addressed. There were a number of commits that lead to crashes. commit ec9bf373f2458f4b5f1ece8b93a07e6204081667 "RDMA/core: Use refcount_t instead of atomic_t on refcount of ib_uverbs_device" leads to the following kernel oops. commit 205be5dc9984b67a3b388cbdaa27a2f2644a4bd6 "RDMA/irdma: Fix spelling mistake "Allocal" -> "Allocate"" also leads to the kernel oops.