On 6/4/2021 11:22 AM, Pearson, Robert B wrote:
On 6/4/2021 12:37 AM, Zhu Yanjun wrote:
After I added a rxe device on the netdev, then run rdma-core test tools.
Then I remove rxe device, in the end, I unloaded rdma_rxe kernel
modules.
I found the above logs.
"
[ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
[ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
[ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
"
It seems that some resources leak.
I will make further investigations.
Zhu Yanjun
Zhu,
I suspect this is an older error. I traced all the add and drop ref
calls for PDs, then ran the full suite of Python tests and also
test_mr which includes the memory window tests by itself and then
counted the adds and drops. For test_mr alone I get 85 adds and 85
drops but when I run the whole suite I get 384 adds and 380 drops.
Since the memory window code is only exercised in test_mr I think it
is OK. Somewhere else there are missing drops. I will try to isolate
them.
Bob
Zhu,
In rdma_core/tests/test_qpex.py test_qp_ex_rc_atomic_cmp_swp and
test_qp_ex_rc_atomic_fetch_add each have two missing drops of PDs. This
is either a test bug or a bug in the rxe driver but it has nothing to do
with the MW code. We should treat it as a separate error. For some
reason these test cases are not cleaning up all resources.
The cleanup code in all these Python tests is very implicit. It just
happens by magic so it is hard to figure out where an ibv_destroy_qp or
ibv_destroy_cq went missing. It would help if someone who is familiar
with these tests could look at it.
Bob