On 10/14/21 9:57 AM, Bob Pearson wrote: > I have been chasing a bug in the rxe driver seen in the python tests (test_cq_events_ud). > The following occurs > > The first time I execute this test it creates two AHs which are allocated by > rdma-core and passed to rxe_create_ah. The test attempts to destroy them > (i.e. rxe_destroy_ah is called in the provider driver) but rdma-core does not > destroy them (i.e. rxe_destroy_ah is not called in the kernel). > > The rxe driver saves the AV state and some metadata for these AHs and keeps it > since it thinks they are still active. > > The second or third time I execute this test two new AHs are created by > rxe_create_ah but the memory passed in from rdma-core is the same as the first > test. I.e. it has recycled them but they are still active in the driver so > the result is chaos. > > Somehow rdma-core thinks it has destroyed the AHs but it does not call down to the > driver. This only occurs for AHs AFAIK. > > Bob > The cause seems simple enough. In uverbs_cmd.c ib_uverbs_create_ah() calls rdma_create_user_ah() which eventually calls device->ops.create_user_ah() or device->ops.create_ah(). But ib_uverbs_destroy_ah does *not* call rdma_uverbs_destroy_ah() it just deletes the object.