On Wed, May 04, 2022 at 03:15:13PM -0400, Ryan Stone wrote: > I was reading through the IPoIB code and I think that I see a bug that > affects ipoib_reap_dead_ahs() when using datagram mode. > > When sending a packet, if we aren't using the CM (which I assume means > that we are using datagram mode), we fall into the following case: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/ulp/ipoib/ipoib_main.c#n1163 > > The AH for our neighbour has its last_send field set to the return > value from the RDMA driver's send function > > If I look at how this is used in ipoib_reap_dead_ahs(), it compares > last_send to the current tail of the completion(?) queue. I believe > that this is intended to check that the last outstanding WQ entry that > references the AH has completed. > > However, if I look at the actual implementation in mlx5, the send > function always returns NETDEV_TX_OK: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c#n635 > > If my understanding of all of this is correct, this could lead to a > premature freeing of an AH and a use-after-free bug IPoIB in mlx5 is HW offloaded version of ulp/ipoib one. AFAIK, it doesn't change "tx_tail" and we won't enter into this if (...). Thanks