On Fri, Feb 12, 2021 at 02:50:42PM +0000, Chuck Lever wrote: > Hi Jason- > > Thanks for your review. > > > > On Feb 12, 2021, at 9:43 AM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > On Thu, Feb 11, 2021 at 05:15:30PM -0500, Chuck Lever wrote: > >> RDMA core mutex locking was restructured by d114c6feedfe ("RDMA/cma: > >> Add missing locking to rdma_accept()") [Aug 2020]. When lock > >> debugging is enabled, the RPC/RDMA server trips over the new lockdep > >> assertion in rdma_accept() because it doesn't call rdma_accept() > >> from its CM event handler. > >> > >> As a temporary fix, have svc_rdma_accept() take the mutex > >> explicitly. In the meantime, let's consider how to restructure the > >> RPC/RDMA transport to invoke rdma_accept() from the proper context. > >> > >> Calls to svc_rdma_accept() are serialized with calls to > >> svc_rdma_free() by the generic RPC server layer. > >> > >> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > Fixes line > > Wasn't clear to me which commit should be listed. d114c6feedfe ? Yes, this is the earliest it can go back, arguably it should be backported further, but the bug from missing this lock is very small > > But this really funny looking, before it gets to accept the handler is > > still the listen handler so any incoming events will just be > > discarded. > > Yeah, not clear to me why two CM event handlers are necessary. > If they are truly needed, a comment would be helpful. Looks like the only thing it does here is discard the disconnected event before accept, so if svc_xprt_enqueue() can run concurrently with the accept process they can be safely combined > > However the rdma_accept() should fail if the state machine has been > > moved from the accepting state, and I think the only meaningful event > > that can be delivered here is disconnect. So the rdma_accept() failure > > does trigger destroy_id, which is the right thing on disconnect anyhow. > > The mutex needs to be released before the ID is destroyed, right? Yes, noting that the handler can potentially still be called until the ID is destroyed, so its has to be safe against races with the svc_xprt_enqueue() too. Though the core code as a destroy_id_handler_unlock() which can be called under lock that is used to make the destruction atomic with the handlers. Jason