On Tue, Apr 14, 2020 at 03:57:20PM +0200, Håkon Bugge wrote: > > > > On 14 Apr 2020, at 14:50, Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > > > On Tue, Apr 14, 2020 at 12:34:35PM +0200, Håkon Bugge wrote: > > > >>>>>> Shall I make a v2 base on next based on this idea, or do you have > >>>>>> something coming? > >>>>> > >>>>> Sure, I have nothing :) > >>>>> > >>>>> Also that rdma_destroy_id in addr_handler looks wrong.. ie we still > >>>>> retain pointers to the rdma_cm_id it destroys inside the struct > >>>>> ucma_context, don't we? > >>>> > >>>> On entry from user-space, we use the u32 id and looks it up in the > >>>> XArray. But if rdma_destoy_id() is called asynchronously called > >>>> between ucma_get_ctx_dev() and the de-reference of ctx->cm_id, we > >>>> are toast. > >>> > >>> Is that what happens on the addr_handler path? > >> > >> No, there, the main problem is the revert of the state > >> transitions. The first transition enables rdma_resolve_route() to > >> pass its gate (i.e. state == ADDR_RESOLVED). Then it thinks the > >> address is resolved, but the addr_handler changes its mind > >> afterwards. > > > > That is a problem, but the call to rdma_destroy_id looks like another > > problem > > I am not sure. Almost all events/incoming packets, can, after the > cm_id's event_handler is called from cma_ib_handler(), call > rdma_destroy_id(). I think the trick is that ucma_event_handler never returns failure unless RDMA_CM_EVENT_CONNECT_REQUEST, which means the cm_id isn't in the xarray yet? > I assume the refcounting takes care of this. There is no refcounting for destroy, it must be called once. Jason