Re: [PATCH v1 04/14] xprtrdma: Use ib_device pointer safely

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 7, 2015, at 11:11 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote:

> On 5/7/2015 5:12 PM, Chuck Lever wrote:
>> 
>> On May 7, 2015, at 9:56 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote:
>> 
>>> On 5/7/2015 4:39 PM, Chuck Lever wrote:
>>>> 
>>>> On May 7, 2015, at 6:00 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote:
>>>> 
>>>>> On 5/4/2015 8:57 PM, Chuck Lever wrote:
>>>>>> The connect worker can replace ri_id, but prevents ri_id->device
>>>>>> from changing during the lifetime of a transport instance.
>>>>>> 
>>>>>> Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
>>>>>> The cached copy can be used safely in code that does not serialize
>>>>>> with the connect worker.
>>>>>> 
>>>>>> Other code can use it to save an extra address generation (one
>>>>>> pointer dereference instead of two).
>>>>>> 
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>>> ---
>>>>>>  net/sunrpc/xprtrdma/fmr_ops.c      |    8 +----
>>>>>>  net/sunrpc/xprtrdma/frwr_ops.c     |   12 +++----
>>>>>>  net/sunrpc/xprtrdma/physical_ops.c |    8 +----
>>>>>>  net/sunrpc/xprtrdma/verbs.c        |   61 +++++++++++++++++++-----------------
>>>>>>  net/sunrpc/xprtrdma/xprt_rdma.h    |    2 +
>>>>>>  5 files changed, 43 insertions(+), 48 deletions(-)
>>>>>> 
>>>>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
>>>>>> index 302d4eb..0a96155 100644
>>>>>> --- a/net/sunrpc/xprtrdma/fmr_ops.c
>>>>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
>>>>>> @@ -85,7 +85,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>>>>>>  	   int nsegs, bool writing)
>>>>>>  {
>>>>>>  	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>>>> -	struct ib_device *device = ia->ri_id->device;
>>>>>> +	struct ib_device *device = ia->ri_device;
>>>>>>  	enum dma_data_direction direction = rpcrdma_data_dir(writing);
>>>>>>  	struct rpcrdma_mr_seg *seg1 = seg;
>>>>>>  	struct rpcrdma_mw *mw = seg1->rl_mw;
>>>>>> @@ -137,17 +137,13 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
>>>>>>  {
>>>>>>  	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>>>>  	struct rpcrdma_mr_seg *seg1 = seg;
>>>>>> -	struct ib_device *device;
>>>>>>  	int rc, nsegs = seg->mr_nsegs;
>>>>>>  	LIST_HEAD(l);
>>>>>> 
>>>>>>  	list_add(&seg1->rl_mw->r.fmr->list, &l);
>>>>>>  	rc = ib_unmap_fmr(&l);
>>>>>> -	read_lock(&ia->ri_qplock);
>>>>>> -	device = ia->ri_id->device;
>>>>>>  	while (seg1->mr_nsegs--)
>>>>>> -		rpcrdma_unmap_one(device, seg++);
>>>>>> -	read_unlock(&ia->ri_qplock);
>>>>>> +		rpcrdma_unmap_one(ia->ri_device, seg++);
>>>>> 
>>>>> Umm, I'm wandering if this is guaranteed to be the same device as
>>>>> ri_id->device?
>>>>> 
>>>>> Imagine you are working on a bond device where each slave belongs to
>>>>> a different adapter. When the active port toggles, you will see a
>>>>> ADDR_CHANGED event (that the current code does not handle...), what
>>>>> you'd want to do is just reconnect and rdma_cm will resolve the new
>>>>> address for you (via the backup slave). I suspect that in case this
>>>>> flow is concurrent with the reconnects you may end up with a stale
>>>>> device handle.
>>>> 
>>>> I’m not sure what you mean by “stale” : freed memory?
>>>> 
>>>> I’m looking at this code in rpcrdma_ep_connect() :
>>>> 
>>>>  916                 if (ia->ri_id->device != id->device) {
>>>>  917                         printk("RPC:       %s: can't reconnect on "
>>>>  918                                 "different device!\n", __func__);
>>>>  919                         rdma_destroy_id(id);
>>>>  920                         rc = -ENETUNREACH;
>>>>  921                         goto out;
>>>>  922                 }
>>>> 
>>>> After reconnecting, if the ri_id has changed, the connect fails. Today,
>>>> xprtrdma does not support the device changing out from under it.
>>>> 
>>>> Note also that our receive completion upcall uses ri_id->device for
>>>> DMA map syncing. Would that also be a problem during a bond failover?
>>>> 
>>> 
>>> I'm not talking about ri_id->device, this will be consistent. I'm
>>> wandering about ia->ri_device, which might not have been updated yet.
>> 
>> ia->ri_device is never updated. The only place it is set is in
>> rpcrdma_ia_open().
> 
> So you assume that each ri_id that you will recreate contains the
> same device handle?

This issue is still unresolved.

xprtrdma does not assume the device pointers are the same, it does
actually check for it. The connect worker is careful to create a new
ri_id first, then check to see that the new ri_id device pointer is
the same as the device pointer in the old (defunct) ri_id.

If the device pointer changes, the old ri_id is kept and the connect
operation fails (retried later).

That is how we guarantee that ia->ri_id->device never changes, and
thus the cached pointer in ia->ri_device never needs to be updated.

> I think that for ADDR_CHANGE event when the slave belongs to another
> device you will hit a mismatch. CC'ing Sean for more info…

Yes, I’d like some confirmation that xprtrdma is not abusing the verbs
API. :-)

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux