On 7/31/23 13:43, Jason Gunthorpe wrote: > On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote: >> On 7/31/23 13:31, Jason Gunthorpe wrote: >>> On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote: >>>> On 7/31/23 13:15, Jason Gunthorpe wrote: >>>>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: >>>>>> This patch gives a more detailed list of objects that are not >>>>>> freed in case of error before the module exits. >>>>>> >>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> >>>>>> --- >>>>>> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- >>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> index cb812bd969c6..3249c2741491 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, >>>>>> >>>>>> void rxe_pool_cleanup(struct rxe_pool *pool) >>>>>> { >>>>>> - WARN_ON(!xa_empty(&pool->xa)); >>>>>> + unsigned long index; >>>>>> + struct rxe_pool_elem *elem; >>>>>> + >>>>>> + xa_lock(&pool->xa); >>>>>> + xa_for_each(&pool->xa, index, elem) { >>>>>> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, >>>>>> + elem->index); >>>>>> + } >>>>>> + xa_unlock(&pool->xa); >>>>>> + >>>>>> + xa_destroy(&pool->xa); >>>>>> } >>>>> >>>>> Is this why? Just count the number of unfinalized objects and report >>>>> if there is difference, don't mess up the xarray. >>>>> >>>>> Jason >>>> This is why I made the last change but I really didn't like that there was no >>>> way to lookup the qp from its index since we were using a NULL xarray entry to >>>> indicate the state of the qp. Making it explicit, i.e. a variable in the struct >>>> seems much more straight forward. Not sure why you hated the last >>>> change. >>> >>> Because if you don't call finalize abort has to be deterministic, and >>> abort can't be that if it someone else can access the poitner and, eg, >>> take a reference. >> >> rxe_pool_get_index() is the only 'correct' way to look up the pointer and >> it checks the valid state (now). Scanning the xarray or just looking up >> the qp is something outside the scope of the normal flows. Like listing >> orphan objects on module exit. > > Maybe at this instance, but people keep changing this and it is > fundamentally wrong to store a pointer to an incompletely initialized > memory for other threads to see. > > Especially for some minor debugging feature. Maybe warning comments would help. There are lots of ways to break the code, sigh. The typical one is someone makes a change that breaks reference counting. > > Jason