On Fri, Dec 09, 2022 at 09:42:29PM +0800, Hillf Danton wrote: > On 9 Dec 2022 09:01:14 -0400 Jason Gunthorpe <jgg@xxxxxxxx> > > On Thu, Dec 08, 2022 at 11:14:39AM +0200, Leon Romanovsky wrote: > > > > > Jason, what do you think? > > > > No, the key to this report is that the refcount dec is inside the tracker: > > > > > > __refcount_dec include/linux/refcount.h:344 [inline] > > > > refcount_dec include/linux/refcount.h:359 [inline] > > > > ref_tracker_free+0x539/0x6b0 lib/ref_tracker.c:118 > > > > netdev_tracker_free include/linux/netdevice.h:4039 [inline] > > > > Which is not underflowing the refcount on the dev, it is actually > > trying to say the tracker has become unbalanced. > > > > Eg this put is not matched with a hold that specified the tracker. > > > > Probably this: > > > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > > index ff35cebb25e265..115b77c5e9a146 100644 > > --- a/drivers/infiniband/core/device.c > > +++ b/drivers/infiniband/core/device.c > > @@ -2192,6 +2192,7 @@ static void free_netdevs(struct ib_device *ib_dev) > > if (ndev) { > > spin_lock(&ndev_hash_lock); > > hash_del_rcu(&pdata->ndev_hash_link); > > + netdev_tracker_free(ndev, &pdata->netdev_tracker); > > spin_unlock(&ndev_hash_lock); > > > > /* > > @@ -2201,7 +2202,7 @@ static void free_netdevs(struct ib_device *ib_dev) > > * comparisons after the put > > */ > > rcu_assign_pointer(pdata->netdev, NULL); > > - dev_put(ndev); > > + __dev_put(ndev); > > } > > spin_unlock_irqrestore(&pdata->netdev_lock, flags); > > } > > Wonder why this makes sense given rcu_assign_pointer(pdata->netdev, NULL) > under pdata->netdev_lock. Oh, yah, that is right, so we can just do the natural thing: rcu_assign_pointer(pdata->netdev, NULL); - dev_put(ndev); + netdev_put(ndev, &pdata->netdev_tracker); Jason