On Mon, May 14, 2018 at 11:11:11AM +0300, Leon Romanovsky wrote: > From: Parav Pandit <parav@xxxxxxxxxxxx> > > Currently a GID entry can be deleted from the GID table while it is > in use by ULPs, CM modules or user space applications. > It may even get replaced with new attributes and/or GID value which > may lead to undesired traffic in the network. > Previous and new GID entry might be in two different net namespace; this > may lead to send traffic to undesired namespace. > Provider drivers might be working on GID entry while modifying QP or > creating address handle, and GID entry can get modified. > > So this patch introduces GID reference count infrastructure without > actually exposing GID get/put/hold APIs to overcome above issues. > > GID entry reference count is based on standard kref structure. > A new state GID_TABLE_ENTRY_PENDING_DEL indicates that a given GID > entry is marked as pending for deletion. > > While it is in pending state, allocator must not reuse such entry. > When GID entry is in PENDING_DEL state, no more new references can > be taken using find operation. This ensures that no new processing > starts for such expiring GID entry. > > GID find APIs ignore the entries which are marked for deletion. > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > drivers/infiniband/core/cache.c | 96 ++++++++++++++++++++++++++++++++++------- > 1 file changed, 80 insertions(+), 16 deletions(-) > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > index 47fa72b434be..f09966b907af 100644 > +++ b/drivers/infiniband/core/cache.c > @@ -73,9 +73,18 @@ enum gid_table_entry_props { > enum gid_table_entry_state { > GID_TABLE_ENTRY_FREE = 1, > GID_TABLE_ENTRY_ALLOCATED = 2, > + /* > + * Indicates that entry is pending to be removed, there may > + * be active users of this GID entry. > + * When last user of the GID entry releases reference to it, > + * GID state moves to GID_TABLE_ENTRY_FREE. > + */ > + GID_TABLE_ENTRY_PENDING_DEL = 3, > }; > > struct ib_gid_table_entry { > + struct kref kref; > + struct work_struct del_work; > enum gid_table_entry_props props; > enum gid_table_entry_state state; > union ib_gid gid; > @@ -166,17 +175,70 @@ is_gid_table_entry_valid(const struct ib_gid_table_entry *entry) > return entry->state == GID_TABLE_ENTRY_ALLOCATED; > } > > -static void del_roce_gid(struct ib_device *device, u8 port_num, > - struct ib_gid_table *table, int ix) > +static void clear_gid_entry(struct ib_gid_table *table, > + struct ib_gid_table_entry *entry) > { > + write_lock_irq(&table->rwlock); > + entry->state = GID_TABLE_ENTRY_FREE; > + write_unlock_irq(&table->rwlock); > + > + memcpy(&entry->gid, &zgid, sizeof(zgid)); > + memset(&entry->attr, 0, sizeof(entry->attr)); > + entry->context = NULL; > +} This reads in the wrong order.. Once the entry is marked free something could move it to ALLOCATED and race with the memset here. However since the table->lock mutex has to be held when calling this function and during other writes to gid, that shouldn't be possible to actually happen.. I checked and this seems to be true at least. Still worth re-ordering for clarity. Also why memcpy zgid instead of memset? zgid is only really useful for memcmp, and arguably we should just have a is_zero_gid() function instead.. > /* > * Additionally find_gid() is used to find valid entry during > - * lookup operation, where validity needs to be checked. So > - * find the empty entry first to continue to search for a free > - * slot and ignore its INVALID flag. > + * lookup operation; so ignore the entries which are marked as > + * pending for removal and the entry which is marked as invalid. > */ 'and the entries which are marked' > if (!is_gid_table_entry_valid(data)) > continue; > @@ -494,7 +552,8 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, > mutex_lock(&table->lock); > > for (ix = 0; ix < table->sz; ix++) { > - if (table->data_vec[ix].attr.ndev == ndev) { > + if (is_gid_table_entry_valid(&table->data_vec[ix]) && > + table->data_vec[ix].attr.ndev == ndev) { > del_gid(ib_dev, port, table, ix); > deleted = true; > } > @@ -741,8 +800,7 @@ static void cleanup_gid_table_port(struct ib_device *ib_dev, u8 port, > > mutex_lock(&table->lock); > for (i = 0; i < table->sz; ++i) { > - if (memcmp(&table->data_vec[i].gid, &zgid, > - sizeof(table->data_vec[i].gid))) { > + if (is_gid_table_entry_valid(&table->data_vec[i])) { > del_gid(ib_dev, port, table, i); > deleted = true; > } > @@ -1297,6 +1355,12 @@ void ib_cache_cleanup_one(struct ib_device *device) > ib_unregister_event_handler(&device->cache.event_handler); > flush_workqueue(ib_wq); > gid_table_cleanup_one(device); > + > + /* > + * Flush the wq second time for any pending GID delete > + * work for RoCE device. > + */ > + flush_workqueue(ib_wq); > } Humm.. Either this flush is not needed or it isn't strong enough. I think we need to wait until the gid table is fully free'd and all PENDING_DELS are completed ? Or if we don't then we don't need to wait for any PENDING_DEL so the flush can go away. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html