Hi Leon, > -----Original Message----- > From: Leon Romanovsky > Sent: Wednesday, April 10, 2019 4:29 AM > To: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe > <jgg@xxxxxxxxxxxx> > Cc: RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>; Huy Nguyen > <huyn@xxxxxxxxxxxx>; Martin Wilck <mwilck@xxxxxxxx>; Parav Pandit > <parav@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 7/7] RDMA/core: Allow detaching gid > attribute netdevice for RoCE > > On Wed, Apr 10, 2019 at 11:15:24AM +0300, Leon Romanovsky wrote: > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > When there is active traffic through a GID, a QP/AH holds reference to > > this GID entry. RoCE GID entry holds reference to its attached > > netdevice. Due to this when netdevice is deleted by admin user, its > > refcount is not dropped. > > > > Therefore, while deleting RoCE GID, wait for all GID attribute's > > netdev users to finish accessing netdev in rcu context. > > Once all users done accessing it, release the netdev refcount. > > > > Signed-off-by: Huy Nguyen <huyn@xxxxxxxxxxxx> > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > --- > > drivers/infiniband/core/cache.c | 75 > > +++++++++++++++++++++++++++------ drivers/infiniband/core/sysfs.c | 13 > ++++-- > > include/rdma/ib_verbs.h | 2 +- > > 3 files changed, 73 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/infiniband/core/cache.c > > b/drivers/infiniband/core/cache.c index d1626502a0d5..0a6f67f5b34f > > 100644 > > --- a/drivers/infiniband/core/cache.c > > +++ b/drivers/infiniband/core/cache.c > > @@ -78,11 +78,22 @@ enum gid_table_entry_state { > > GID_TABLE_ENTRY_PENDING_DEL = 3, > > }; > > > > +struct roce_gid_ndev_storage { > > + struct rcu_head rcu_head; > > + struct net_device *ndev; > > +}; > > + > > struct ib_gid_table_entry { > > struct kref kref; > > struct work_struct del_work; > > struct ib_gid_attr attr; > > void *context; > > + /* Store the ndev pointer to release reference later on in > > + * call_rcu context because by that time gid_table_entry > > + * and attr might be already freed. So keep a copy of it. > > + * ndev_storage is freed by rcu callback. > > + */ > > + struct roce_gid_ndev_storage *ndev_storage; > > enum gid_table_entry_state state; > > }; > > > > @@ -206,6 +217,20 @@ static void schedule_free_gid(struct kref *kref) > > queue_work(ib_wq, &entry->del_work); } > > > > +static void put_gid_ndev(struct rcu_head *head) { > > + struct roce_gid_ndev_storage *storage = > > + container_of(head, struct roce_gid_ndev_storage, rcu_head); > > + > > + WARN_ON(!storage->ndev); > > + /* At this point its safe to release netdev reference, > > + * as all callers working on gid_attr->ndev are done > > + * using this netdev. > > + */ > > + dev_put(storage->ndev); > > + kfree(storage); > > +} > > + > > static void free_gid_entry_locked(struct ib_gid_table_entry *entry) > > { > > struct ib_device *device = entry->attr.device; @@ -228,8 +253,8 @@ > > static void free_gid_entry_locked(struct ib_gid_table_entry *entry) > > /* Now this index is ready to be allocated */ > > write_unlock_irq(&table->rwlock); > > > > - if (entry->attr.ndev) > > - dev_put(entry->attr.ndev); > > + if (entry->ndev_storage) > > + call_rcu(&entry->ndev_storage->rcu_head, put_gid_ndev); > > kfree(entry); > > } > > > > @@ -266,14 +291,27 @@ static struct ib_gid_table_entry * > > alloc_gid_entry(const struct ib_gid_attr *attr) { > > struct ib_gid_table_entry *entry; > > + struct net_device *ndev; > > > > entry = kzalloc(sizeof(*entry), GFP_KERNEL); > > if (!entry) > > return NULL; > > + /* rcu_dereference is not needed. It is only to avoid > > + * smatch complain. > > + */ > > + ndev = rcu_dereference(attr->ndev); > > I don't know why I missed it, but it is wrong. > It should be rcu_access_pointer(). > We are dereferencing a pointer using dev_hold() under if (ndev). rcu_access_pointer() comment snippet section below, * Return the value of the specified RCU-protected pointer, but omit the * lockdep checks for being in an RCU read-side critical section. This is * useful when the value of this pointer is accessed, but the pointer is * not dereferenced, ^^^^^^^^^^^ So we should not use rcu_access_pointer() due to dev_hold?