On Fri, Jun 04, 2021 at 12:53:51PM -0500, Pearson, Robert B wrote: > > On 6/4/2021 11:22 AM, Pearson, Robert B wrote: > > > > On 6/4/2021 12:37 AM, Zhu Yanjun wrote: > > > > > > After I added a rxe device on the netdev, then run rdma-core test tools. > > > Then I remove rxe device, in the end, I unloaded rdma_rxe kernel > > > modules. > > > I found the above logs. > > > " > > > [ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem > > > [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem > > > [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem > > > " > > > > > > It seems that some resources leak. > > > > > > I will make further investigations. > > > > > > Zhu Yanjun > > > > Zhu, > > > > I suspect this is an older error. I traced all the add and drop ref > > calls for PDs, then ran the full suite of Python tests and also test_mr > > which includes the memory window tests by itself and then counted the > > adds and drops. For test_mr alone I get 85 adds and 85 drops but when I > > run the whole suite I get 384 adds and 380 drops. Since the memory > > window code is only exercised in test_mr I think it is OK. Somewhere > > else there are missing drops. I will try to isolate them. > > > > Bob > > > Zhu, > > In rdma_core/tests/test_qpex.py test_qp_ex_rc_atomic_cmp_swp and > test_qp_ex_rc_atomic_fetch_add each have two missing drops of PDs. This is > either a test bug or a bug in the rxe driver but it has nothing to do with > the MW code. We should treat it as a separate error. For some reason these > test cases are not cleaning up all resources. > > The cleanup code in all these Python tests is very implicit. It just happens > by magic so it is hard to figure out where an ibv_destroy_qp or > ibv_destroy_cq went missing. It would help if someone who is familiar with > these tests could look at it. It is impossible for userspace to leak a kernel resource, when the fd is closed everything is destroyed back to the driver guarenteed by the kernel. As long as pyverbs has exited pyverbs cannot be the bug Jason