On Wed, Jun 17, 2020 at 03:23:00PM -0300, Jason Gunthorpe wrote: > On Wed, Jun 17, 2020 at 08:17:39AM +0300, Leon Romanovsky wrote: > > > > My thoughts that everything here hints me that state machine and > > locking are implemented wrongly. In ideal world, the expectation > > is that REQ message will have a state in it (PREPARED, SENT, ACK > > e.t.c.) and list manipulations are done accordingly with proper > > locks, while rdma_nl_multicast() is done outside of the locks. > > It can't be done outside the lock without creating races - once > rdma_nl_multicast happens it is possible for the other leg of the > operation to begin processing. It means that the state machine is wrong, not complete. > > The list must be updated before this happens. > > What is missing here is refcounting - the lifetime model of this data > is too implicit, but it is not worth adding I think I have same feeling for now, but it will flip if new fixes be in this area. Thanks > > Jason