On 10/14/21 11:14 AM, Bob Pearson wrote: > On 10/14/21 9:57 AM, Bob Pearson wrote: >> I have been chasing a bug in the rxe driver seen in the python tests (test_cq_events_ud). >> The following occurs >> >> The first time I execute this test it creates two AHs which are allocated by >> rdma-core and passed to rxe_create_ah. The test attempts to destroy them >> (i.e. rxe_destroy_ah is called in the provider driver) but rdma-core does not >> destroy them (i.e. rxe_destroy_ah is not called in the kernel). >> >> The rxe driver saves the AV state and some metadata for these AHs and keeps it >> since it thinks they are still active. >> >> The second or third time I execute this test two new AHs are created by >> rxe_create_ah but the memory passed in from rdma-core is the same as the first >> test. I.e. it has recycled them but they are still active in the driver so >> the result is chaos. >> >> Somehow rdma-core thinks it has destroyed the AHs but it does not call down to the >> driver. This only occurs for AHs AFAIK. >> >> Bob >> > > The cause seems simple enough. > > In uverbs_cmd.c ib_uverbs_create_ah() calls rdma_create_user_ah() which > eventually calls device->ops.create_user_ah() or device->ops.create_ah(). > > But ib_uverbs_destroy_ah does *not* call rdma_uverbs_destroy_ah() it just should be rdma_destroy_user_ah() > deletes the object. >