RE: [PATCH rdma-next 05/12] IB/core: Introduce GID entry reference counts for RoCE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@xxxxxxxx]
> Sent: Wednesday, May 16, 2018 4:48 PM
> To: Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Parav Pandit <parav@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 05/12] IB/core: Introduce GID entry reference
> counts for RoCE
> 
> On Mon, May 14, 2018 at 11:11:11AM +0300, Leon Romanovsky wrote:
> > From: Parav Pandit <parav@xxxxxxxxxxxx>
> >
> > Currently a GID entry can be deleted from the GID table while it is in
> > use by ULPs, CM modules or user space applications.
> > It may even get replaced with new attributes and/or GID value which
> > may lead to undesired traffic in the network.
> > Previous and new GID entry might be in two different net namespace;
> > this may lead to send traffic to undesired namespace.
> > Provider drivers might be working on GID entry while modifying QP or
> > creating address handle, and GID entry can get modified.
> >
> > So this patch introduces GID reference count infrastructure without
> > actually exposing GID get/put/hold APIs to overcome above issues.
> >
> > GID entry reference count is based on standard kref structure.
> > A new state GID_TABLE_ENTRY_PENDING_DEL indicates that a given GID
> > entry is marked as pending for deletion.
> >
> > While it is in pending state, allocator must not reuse such entry.
> > When GID entry is in PENDING_DEL state, no more new references can be
> > taken using find operation. This ensures that no new processing starts
> > for such expiring GID entry.
> >
> > GID find APIs ignore the entries which are marked for deletion.
> >
> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > drivers/infiniband/core/cache.c | 96
> > ++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 80 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/cache.c
> > b/drivers/infiniband/core/cache.c index 47fa72b434be..f09966b907af
> > 100644
> > +++ b/drivers/infiniband/core/cache.c
> > @@ -73,9 +73,18 @@ enum gid_table_entry_props {  enum
> > gid_table_entry_state {
> >  	GID_TABLE_ENTRY_FREE		= 1,
> >  	GID_TABLE_ENTRY_ALLOCATED	= 2,
> > +	/*
> > +	 * Indicates that entry is pending to be removed, there may
> > +	 * be active users of this GID entry.
> > +	 * When last user of the GID entry releases reference to it,
> > +	 * GID state moves to GID_TABLE_ENTRY_FREE.
> > +	 */
> > +	GID_TABLE_ENTRY_PENDING_DEL	= 3,
> >  };
> >
> >  struct ib_gid_table_entry {
> > +	struct kref			kref;
> > +	struct work_struct		del_work;
> >  	enum gid_table_entry_props	props;
> >  	enum gid_table_entry_state	state;
> >  	union ib_gid			gid;
> > @@ -166,17 +175,70 @@ is_gid_table_entry_valid(const struct
> ib_gid_table_entry *entry)
> >  	return entry->state == GID_TABLE_ENTRY_ALLOCATED;  }
> >
> > -static void del_roce_gid(struct ib_device *device, u8 port_num,
> > -			 struct ib_gid_table *table, int ix)
> > +static void clear_gid_entry(struct ib_gid_table *table,
> > +			    struct ib_gid_table_entry *entry)
> >  {
> > +	write_lock_irq(&table->rwlock);
> > +	entry->state = GID_TABLE_ENTRY_FREE;
> > +	write_unlock_irq(&table->rwlock);
> > +
> > +	memcpy(&entry->gid, &zgid, sizeof(zgid));
> > +	memset(&entry->attr, 0, sizeof(entry->attr));
> > +	entry->context = NULL;
> > +}
> 
> This reads in the wrong order.. Once the entry is marked free something could
> move it to ALLOCATED and race with the memset here.
> 
> However since the table->lock mutex has to be held when calling this function
> and during other writes to gid, that shouldn't be possible to actually happen.. I
> checked and this seems to be true at least. Still worth re-ordering for clarity.
> 
This logic goes way with kref structure of v1 because add_modify_gid() creates new entry and updates the pointer in the gid table.
I will post internal review to you.

> Also why memcpy zgid instead of memset? zgid is only really useful for
> memcmp, and arguably we should just have a is_zero_gid() function instead..
Will add is_zero_gid() to pre-patch to this patch, to keep this patch shorter.

> 
> >  		/*
> >  		 * Additionally find_gid() is used to find valid entry during
> > -		 * lookup operation, where validity needs to be checked. So
> > -		 * find the empty entry first to continue to search for a free
> > -		 * slot and ignore its INVALID flag.
> > +		 * lookup operation; so ignore the entries which are marked as
> > +		 * pending for removal and the entry which is marked as invalid.
> >  		 */
> 
> 'and the entries which are marked'
> 
Fixing in v1.

> >  		if (!is_gid_table_entry_valid(data))
> >  			continue;
> > @@ -494,7 +552,8 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device
> *ib_dev, u8 port,
> >  	mutex_lock(&table->lock);
> >
> >  	for (ix = 0; ix < table->sz; ix++) {
> > -		if (table->data_vec[ix].attr.ndev == ndev) {
> > +		if (is_gid_table_entry_valid(&table->data_vec[ix]) &&
> > +		    table->data_vec[ix].attr.ndev == ndev) {
> >  			del_gid(ib_dev, port, table, ix);
> >  			deleted = true;
> >  		}
> > @@ -741,8 +800,7 @@ static void cleanup_gid_table_port(struct
> > ib_device *ib_dev, u8 port,
> >
> >  	mutex_lock(&table->lock);
> >  	for (i = 0; i < table->sz; ++i) {
> > -		if (memcmp(&table->data_vec[i].gid, &zgid,
> > -			   sizeof(table->data_vec[i].gid))) {
> > +		if (is_gid_table_entry_valid(&table->data_vec[i])) {
> >  			del_gid(ib_dev, port, table, i);
> >  			deleted = true;
> >  		}
> > @@ -1297,6 +1355,12 @@ void ib_cache_cleanup_one(struct ib_device
> *device)
> >  	ib_unregister_event_handler(&device->cache.event_handler);
> >  	flush_workqueue(ib_wq);
> >  	gid_table_cleanup_one(device);
> > +
> > +	/*
> > +	 * Flush the wq second time for any pending GID delete
> > +	 * work for RoCE device.
> > +	 */
> > +	flush_workqueue(ib_wq);
> >  }
> 
> Humm.. Either this flush is not needed or it isn't strong enough.
> 
> I think we need to wait until the gid table is fully free'd and all PENDING_DELS
> are completed ?
> 
This second flush waits for gid table being fully freed.

> Or if we don't then we don't need to wait for any PENDING_DEL so the flush can
> go away.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux