Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/1/22 08:53, Jason Gunthorpe wrote:
> On Mon, Jan 31, 2022 at 04:08:40PM -0600, Bob Pearson wrote:
>> Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc
>> and kfree to allocate and free. Use the sequence
>>
>>     <lookup qp>
>>     new_mca = kzalloc(sizeof(*new_mca), GFP_KERNEL);
>>     <spin lock>
>>     <lookup qp again> /* in case of a race */
>>     <init new_mca>
>>     <spin unlock>
>>
>> instead of GFP_ATOMIC inside of the spinlock. Add an extra reference
>> to multicast group to protect the pointer in the index that maps
>> mgid to group.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
>>  drivers/infiniband/sw/rxe/rxe.c       |   8 --
>>  drivers/infiniband/sw/rxe/rxe_mcast.c | 102 +++++++++++++++-----------
>>  drivers/infiniband/sw/rxe/rxe_pool.c  |   5 --
>>  drivers/infiniband/sw/rxe/rxe_pool.h  |   1 -
>>  drivers/infiniband/sw/rxe/rxe_verbs.h |   2 -
>>  5 files changed, 59 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
>> index fab291245366..c55736e441e7 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>> @@ -29,7 +29,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
>>  	rxe_pool_cleanup(&rxe->mr_pool);
>>  	rxe_pool_cleanup(&rxe->mw_pool);
>>  	rxe_pool_cleanup(&rxe->mc_grp_pool);
>> -	rxe_pool_cleanup(&rxe->mc_elem_pool);
>>  
>>  	if (rxe->tfm)
>>  		crypto_free_shash(rxe->tfm);
>> @@ -163,15 +162,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
>>  	if (err)
>>  		goto err9;
>>  
>> -	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
>> -			    rxe->attr.max_total_mcast_qp_attach);
>> -	if (err)
>> -		goto err10;
>> -
>>  	return 0;
>>  
>> -err10:
>> -	rxe_pool_cleanup(&rxe->mc_grp_pool);
>>  err9:
>>  	rxe_pool_cleanup(&rxe->mw_pool);
>>  err8:
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
>> index 9336295c4ee2..4a5896a225a6 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
>> @@ -26,30 +26,40 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
>>  }
>>  
>>  /* caller should hold mc_grp_pool->pool_lock */
>> -static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
>> -				     struct rxe_pool *pool,
>> -				     union ib_gid *mgid)
>> +static int __rxe_create_grp(struct rxe_dev *rxe, struct rxe_pool *pool,
>> +			    union ib_gid *mgid, struct rxe_mcg **grp_p)
>>  {
>>  	int err;
>>  	struct rxe_mcg *grp;
>>  
>>  	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
>>  	if (!grp)
>> -		return ERR_PTR(-ENOMEM);
>> +		return -ENOMEM;
>> +
>> +	err = rxe_mcast_add(rxe, mgid);
>> +	if (unlikely(err)) {
>> +		rxe_drop_ref(grp);
>> +		return err;
>> +	}
>>  
>>  	INIT_LIST_HEAD(&grp->qp_list);
>>  	spin_lock_init(&grp->mcg_lock);
>>  	grp->rxe = rxe;
>> +
>> +	rxe_add_ref(grp);
>>  	rxe_add_key_locked(grp, mgid);
>>  
>> -	err = rxe_mcast_add(rxe, mgid);
>> -	if (unlikely(err)) {
>> -		rxe_drop_key_locked(grp);
>> -		rxe_drop_ref(grp);
>> -		return ERR_PTR(err);
>> -	}
>> +	*grp_p = grp;
> 
> This should return the struct rxe_mcg or an ERR_PTR not use output
> function arguments, like it was before
> 
>> +	return 0;
>> +}
>> +
>> +/* caller is holding a ref from lookup and mcg->mcg_lock*/
>> +void __rxe_destroy_mcg(struct rxe_mcg *grp)
>> +{
>> +	rxe_drop_key(grp);
>> +	rxe_drop_ref(grp);
>>  
>> -	return grp;
>> +	rxe_mcast_delete(grp->rxe, &grp->mgid);
>>  }
>>  
>>  static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>> @@ -68,10 +78,9 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>>  	if (grp)
>>  		goto done;
>>  
>> -	grp = create_grp(rxe, pool, mgid);
>> -	if (IS_ERR(grp)) {
>> +	err = __rxe_create_grp(rxe, pool, mgid, &grp);
>> +	if (err) {
>>  		write_unlock_bh(&pool->pool_lock);
>> -		err = PTR_ERR(grp);
>>  		return err;
> 
> This should return the struct rxe_mcg or ERR_PTR too
> 
>> @@ -126,7 +143,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>>  				   union ib_gid *mgid)
>>  {
>>  	struct rxe_mcg *grp;
>> -	struct rxe_mca *elem, *tmp;
>> +	struct rxe_mca *mca, *tmp;
>>  
>>  	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
>>  	if (!grp)
>> @@ -134,33 +151,30 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>>  
>>  	spin_lock_bh(&grp->mcg_lock);
>>  
>> +	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
>> +		if (mca->qp == qp) {
>> +			list_del(&mca->qp_list);
>>  			grp->num_qp--;
>> +			if (grp->num_qp <= 0)
>> +				__rxe_destroy_mcg(grp);
>>  			atomic_dec(&qp->mcg_num);
>>  
>>  			spin_unlock_bh(&grp->mcg_lock);
>> +			rxe_drop_ref(grp);
> 
> Wwhere did this extra ref come from?
> 
> The grp was threaded on a linked list by rxe_attach_mcast, but that
> also dropped the extra reference.
> 
> The reference should have followed the pointer into the linked list,
> then it could be unput here when unthreading it from the linked list.
> 
> To me this still looks like there should be exactly one ref, the
> rbtree should be removed inside the release function when the ref goes
> to zero (under the pool_lock) and doesn't otherwise hold a ref
> 
> The linked list on the mca should hold the only ref and when
> all the linked list is put back the grp can be released, no need for
> qp_num as well.
> 
> Jason

Here is a rough picture of what is going on. When not active there are pointers to
each mcg in the red-black tree (from a parent of the rb_node contained in mcg) and
if there are attached qp's there are a pair of pointers from the head and tail of the
linked list else none. When active in either verbs or packet processing code there
is a pointer to mcg held in a local variable by the routine obtained from looking
up the mcg in the tree.

                                  +-----+
                                  | rxe |
                                  +-----+
                                     |
                                   +-v-+
                           +-------|rb |-------+
                           |       +---+       |
                           |                 +-v-+
                           |                 |rb |
                        +--v--+              +---+
                        |(rb) |
       +--------------->| mcg |<---------------+
       |                +-----+                |
       |                                       |
       |    +-----+     +-----+     +-----+    |
       +--->| mca |<--->| mca |<--->| mca |<---+
            +-----+     +-----+     +-----+
               |           |           |
               |           |           |
            +--v--+     +--v--+     +--v--+
            | qp  |     | qp  |     | qp  |
            +-----+     +-----+     +-----+

as currently written the local variable has a kref obtained from the kref_get in
rxe_lookup_mcg or the kref_init in rxe_init_mcg if it is newly created. This ref is
dropped when the local variable goes out of scope. To protect the mcg when it is
inactive at least one more ref is required. I take an additional ref in rxe_get_mcg
if the mcg is created to protect the pointer in the red-black tree. This persists
for the lifetime of the object until it is destroyed when it is removed from the tree
and has the longest scope. This is enough to keep the object alive (it works fine BTW.)
It is also possible to take ref's representing the pointers in the list but they
wouldn't add anything.

On the other point. Is it standard practice to user ERRPTRs in the kernel rather than
return arguments? I seem to have seen both styles used but it may have been my imagination. I don't have any preference here but sometimes choose one or the other
in parallel flows to make comparable routines in the flows have similar interfaces.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux