> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@xxxxxxxx] > Sent: Wednesday, May 16, 2018 4:48 PM > To: Leon Romanovsky <leon@xxxxxxxxxx> > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>; > Parav Pandit <parav@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 05/12] IB/core: Introduce GID entry reference > counts for RoCE > > On Mon, May 14, 2018 at 11:11:11AM +0300, Leon Romanovsky wrote: > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > Currently a GID entry can be deleted from the GID table while it is in > > use by ULPs, CM modules or user space applications. > > It may even get replaced with new attributes and/or GID value which > > may lead to undesired traffic in the network. > > Previous and new GID entry might be in two different net namespace; > > this may lead to send traffic to undesired namespace. > > Provider drivers might be working on GID entry while modifying QP or > > creating address handle, and GID entry can get modified. > > > > So this patch introduces GID reference count infrastructure without > > actually exposing GID get/put/hold APIs to overcome above issues. > > > > GID entry reference count is based on standard kref structure. > > A new state GID_TABLE_ENTRY_PENDING_DEL indicates that a given GID > > entry is marked as pending for deletion. > > > > While it is in pending state, allocator must not reuse such entry. > > When GID entry is in PENDING_DEL state, no more new references can be > > taken using find operation. This ensures that no new processing starts > > for such expiring GID entry. > > > > GID find APIs ignore the entries which are marked for deletion. > > > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > drivers/infiniband/core/cache.c | 96 > > ++++++++++++++++++++++++++++++++++------- > > 1 file changed, 80 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/infiniband/core/cache.c > > b/drivers/infiniband/core/cache.c index 47fa72b434be..f09966b907af > > 100644 > > +++ b/drivers/infiniband/core/cache.c > > @@ -73,9 +73,18 @@ enum gid_table_entry_props { enum > > gid_table_entry_state { > > GID_TABLE_ENTRY_FREE = 1, > > GID_TABLE_ENTRY_ALLOCATED = 2, > > + /* > > + * Indicates that entry is pending to be removed, there may > > + * be active users of this GID entry. > > + * When last user of the GID entry releases reference to it, > > + * GID state moves to GID_TABLE_ENTRY_FREE. > > + */ > > + GID_TABLE_ENTRY_PENDING_DEL = 3, > > }; > > > > struct ib_gid_table_entry { > > + struct kref kref; > > + struct work_struct del_work; > > enum gid_table_entry_props props; > > enum gid_table_entry_state state; > > union ib_gid gid; > > @@ -166,17 +175,70 @@ is_gid_table_entry_valid(const struct > ib_gid_table_entry *entry) > > return entry->state == GID_TABLE_ENTRY_ALLOCATED; } > > > > -static void del_roce_gid(struct ib_device *device, u8 port_num, > > - struct ib_gid_table *table, int ix) > > +static void clear_gid_entry(struct ib_gid_table *table, > > + struct ib_gid_table_entry *entry) > > { > > + write_lock_irq(&table->rwlock); > > + entry->state = GID_TABLE_ENTRY_FREE; > > + write_unlock_irq(&table->rwlock); > > + > > + memcpy(&entry->gid, &zgid, sizeof(zgid)); > > + memset(&entry->attr, 0, sizeof(entry->attr)); > > + entry->context = NULL; > > +} > > This reads in the wrong order.. Once the entry is marked free something could > move it to ALLOCATED and race with the memset here. > > However since the table->lock mutex has to be held when calling this function > and during other writes to gid, that shouldn't be possible to actually happen.. I > checked and this seems to be true at least. Still worth re-ordering for clarity. > This logic goes way with kref structure of v1 because add_modify_gid() creates new entry and updates the pointer in the gid table. I will post internal review to you. > Also why memcpy zgid instead of memset? zgid is only really useful for > memcmp, and arguably we should just have a is_zero_gid() function instead.. Will add is_zero_gid() to pre-patch to this patch, to keep this patch shorter. > > > /* > > * Additionally find_gid() is used to find valid entry during > > - * lookup operation, where validity needs to be checked. So > > - * find the empty entry first to continue to search for a free > > - * slot and ignore its INVALID flag. > > + * lookup operation; so ignore the entries which are marked as > > + * pending for removal and the entry which is marked as invalid. > > */ > > 'and the entries which are marked' > Fixing in v1. > > if (!is_gid_table_entry_valid(data)) > > continue; > > @@ -494,7 +552,8 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device > *ib_dev, u8 port, > > mutex_lock(&table->lock); > > > > for (ix = 0; ix < table->sz; ix++) { > > - if (table->data_vec[ix].attr.ndev == ndev) { > > + if (is_gid_table_entry_valid(&table->data_vec[ix]) && > > + table->data_vec[ix].attr.ndev == ndev) { > > del_gid(ib_dev, port, table, ix); > > deleted = true; > > } > > @@ -741,8 +800,7 @@ static void cleanup_gid_table_port(struct > > ib_device *ib_dev, u8 port, > > > > mutex_lock(&table->lock); > > for (i = 0; i < table->sz; ++i) { > > - if (memcmp(&table->data_vec[i].gid, &zgid, > > - sizeof(table->data_vec[i].gid))) { > > + if (is_gid_table_entry_valid(&table->data_vec[i])) { > > del_gid(ib_dev, port, table, i); > > deleted = true; > > } > > @@ -1297,6 +1355,12 @@ void ib_cache_cleanup_one(struct ib_device > *device) > > ib_unregister_event_handler(&device->cache.event_handler); > > flush_workqueue(ib_wq); > > gid_table_cleanup_one(device); > > + > > + /* > > + * Flush the wq second time for any pending GID delete > > + * work for RoCE device. > > + */ > > + flush_workqueue(ib_wq); > > } > > Humm.. Either this flush is not needed or it isn't strong enough. > > I think we need to wait until the gid table is fully free'd and all PENDING_DELS > are completed ? > This second flush waits for gid table being fully freed. > Or if we don't then we don't need to wait for any PENDING_DEL so the flush can > go away. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html