Hi Jason- Thanks for your review. > On Feb 12, 2021, at 9:43 AM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Feb 11, 2021 at 05:15:30PM -0500, Chuck Lever wrote: >> RDMA core mutex locking was restructured by d114c6feedfe ("RDMA/cma: >> Add missing locking to rdma_accept()") [Aug 2020]. When lock >> debugging is enabled, the RPC/RDMA server trips over the new lockdep >> assertion in rdma_accept() because it doesn't call rdma_accept() >> from its CM event handler. >> >> As a temporary fix, have svc_rdma_accept() take the mutex >> explicitly. In the meantime, let's consider how to restructure the >> RPC/RDMA transport to invoke rdma_accept() from the proper context. >> >> Calls to svc_rdma_accept() are serialized with calls to >> svc_rdma_free() by the generic RPC server layer. >> >> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > Fixes line Wasn't clear to me which commit should be listed. d114c6feedfe ? >> --- >> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ >> 1 file changed, 2 insertions(+) > > It could even be right like this.. > > This should be inside the lock though: > > newxprt->sc_cm_id->event_handler = svc_rdma_cma_handler; OK. > But this really funny looking, before it gets to accept the handler is > still the listen handler so any incoming events will just be > discarded. Yeah, not clear to me why two CM event handlers are necessary. If they are truly needed, a comment would be helpful. > However the rdma_accept() should fail if the state machine has been > moved from the accepting state, and I think the only meaningful event > that can be delivered here is disconnect. So the rdma_accept() failure > does trigger destroy_id, which is the right thing on disconnect anyhow. The mutex needs to be released before the ID is destroyed, right? -- Chuck Lever