在 2024/9/4 23:34, Leon Romanovsky 写道:
On Wed, Sep 04, 2024 at 11:31:13AM -0300, Jason Gunthorpe wrote:
On Mon, Sep 02, 2024 at 04:42:52PM +0300, Leon Romanovsky wrote:
From: Leon Romanovsky <leonro@xxxxxxxxxx>
Failure in driver initialization can lead to a situation where the GID
entries are set but not used yet. In this case, the kref will be equal to 1,
which will trigger a false positive leak detection.
Why does that happen??
For example, these messages are printed during the driver initialization
and followed by release_gid_table() call:
infiniband syz1: ib_query_port failed (-19)
infiniband syz1: Couldn't set up InfiniBand P_Key/GID cache
Okay, but who set the ref=1?
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index b7c078b7f7cf..c6aec2e04d4c 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device,
return;
for (i = 0; i < table->sz; i++) {
+ int gid_kref;
+
if (is_gid_entry_free(table->data_vec[i]))
continue;
- WARN_ONCE(true,
+ gid_kref = kref_read(&table->data_vec[i]->kref);
+ WARN_ONCE(gid_kref > 1,
"GID entry ref leak for dev %s index %d ref=%u\n",
- dev_name(&device->dev), i,
- kref_read(&table->data_vec[i]->kref));
+ dev_name(&device->dev), i, gid_kref);
}
I'm not convinced, I think the bug here is something wrong on the
refcounting side not the freeing side. Ref should not be 1. Seems like
missing error unwinding in the init side.
I dropped this patch as the real fix is here 1403c8b14765 ("IB/core: Fix ib_cache_setup_one error flow cleanup")
The commit 1403c8b14765 ("IB/core: Fix ib_cache_setup_one error flow
cleanup") is in the link
https://patchwork.kernel.org/project/linux-rdma/patch/79137687d829899b0b1c9835fcb4b258004c439a.1725273354.git.leon@xxxxxxxxxx/
Zhu Yanjun
Thanks
Jason