On Wed, Jul 21, 2021 at 5:48 AM Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > On Tue, Jul 20, 2021 at 2:27 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > > > On 7/19/21 10:46 PM, Olga Kornievskaia wrote: > > > Hello, > > > > > > I would like to report that the rxe driver got broken some time > > > between 5.13 and 5.14-rc1 (so basically the last git pull). It's not > > > just NFSoRDMA but simple rping doesn't work. I believe I found the > > > problematic commit: 5bcf5a59c41e19141783c7305d420a5e36c937b2 > > > "RDMA/rxe: Protext kernel index from user space" > > > > > > Server side logs: "rdma_rxe: bad ICRC from <>". > > > > > Thanks. That is helpful. Will try to find it. > > Thank you, I appreciate you looking into it. Actually I'm not 100% > confident that's the commit for this particular problem "I" was seeing > in 5.14-rc (which was rping hanging but not crashing. An NFS mount > also hangs, doesn't crash) . But what git bisect was going thru and > encountering crashes so can't say what it "found". So I think that's > the one that cashes kernel oops. I think something else leads to the > bad ICRC. Thanks a lot. I will delve into this problem. Zhu Yanjun > > I have a general question. I see that you've been posting a lot of > work on RDMA/rxe lately. Can this be viewed as somebody (you/your > company) is now actively supporting rxe driver? It looked like > previously Mellanox had abandoned support for it. We ran into several > issues trying to use rxe for NFSoRDMA throughout the years but they > were not being addressed. > > There were a number of commits that lead to crashes. commit > ec9bf373f2458f4b5f1ece8b93a07e6204081667 "RDMA/core: Use refcount_t > instead of atomic_t on refcount of ib_uverbs_device" leads to the > following kernel oops. commit 205be5dc9984b67a3b388cbdaa27a2f2644a4bd6 > "RDMA/irdma: Fix spelling mistake "Allocal" -> "Allocate"" also leads > to the kernel oops.