On Wed, Sep 22, 2021 at 09:38:40AM +0000, Haakon Bugge wrote: > >> @@ -3413,6 +3421,15 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, > >> if (dst_addr->sa_family == AF_IB) { > >> ret = cma_resolve_ib_addr(id_priv); > >> } else { > >> + /* The FSM can return back to RDMA_CM_ADDR_BOUND after > >> + * rdma_resolve_ip() is called, eg through the error > >> + * path in addr_handler. If this happens the existing > >> + * request must be canceled before issuing a new one. > >> + */ > >> + if (id_priv->used_resolve_ip) > >> + rdma_addr_cancel(&id->route.addr.dev_addr); > >> + else > >> + id_priv->used_resolve_ip = 1; > > > > Why don't you never clear this field? If you assume that this is one lifetime > > event, can you please add a comment with an explanation "why"? > > Adding to that, don't you need {READ,WRITE}_ONCE when accessing > used_resolve_ip? The FSM logic guarentees there is no concurrent access here, this is the only thread that can be in this state at this point. > Or will the write to it obtain global visibility because > mutex_unlock(&ctx->mutex) is executed before any other context can > read it? Global visibility flows indirectly through the rdma_resolve_ip() to the work. Basically when the rdma_resolve_ip schedules the work it does a full release, then the work does a spinlock/unlock which is another full release, finally the next time we go through this function it does another spinlock/unlock which will act as ancquire for this store. Jason