RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API

"Steve Wise" <swise@xxxxxxxxxxxxxxxxxxxxx> · Fri, 24 Jul 2015 11:34:36 -0500

> -----Original Message-----
> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-owner@xxxxxxxxxxxxxxx] On Behalf Of Jason Gunthorpe
> Sent: Friday, July 24, 2015 11:27 AM
> To: Chuck Lever
> Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer
> Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
> 
> On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:
> 
> > Unfinished, but operational:
> >
> > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future
> 
> Nice..
> 
> Can you spend some time and reflect on how some of this could be
> lowered into the core code? The FMR and FRWR side have many
> similarities now..
> 
> > FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
> > but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
> > read and write are negatively impacted.
> 
> I'm not surprised since invalidate is sync. I belive you need to
> incorporate SEND WITH INVALIDATE to substantially recover this
> overhead.
> 
> It would be neat if the RQ could continue to advance while waiting for
> the invalidate.. That looks almost doable..
> 
> > I converted the RPC reply handler tasklet to a work queue context
> > to allow sleeping. A new .ro_unmap_sync method is invoked after
> > the RPC/RDMA header is parsed but before xprt_complete_rqst()
> > wakes up the waiting RPC.
> 
> .. so the issue is the RPC must be substantially parsed to learn which
> MR it is associated with to schedule the invalidate?
> 
> > This is actually much more efficient than the current logic,
> > which serially does an ib_unmap_fmr() for each MR the RPC owns.
> > So FMR overall performs better with this change.
> 
> Interesting..
> 
> > Because the next RPC cannot awaken until the last send completes,
> > send queue accounting is based on RPC/RDMA credit flow control.
> 
> So for FRWR the sync invalidate effectively guarentees all SQEs
> related to this RPC are flushed. That seems reasonable, if the number
> of SQEs and CQEs are properly sized in relation to the RPC slot count
> it should be workable..
> 
> How does FMR and PHYS synchronize?
> 
> > I’m sure there are some details here that still need to be
> > addressed, but this fixes the big problem with FRWR send queue
> > accounting, which was that LOCAL_INV WRs would continue to
> > consume SQEs while another RPC was allowed to start.
> 
> Did you test without that artificial limit you mentioned before?
> 
> I'm also wondering about this:
> 
> > During some other testing I found that when a completion upcall
> > returns to the provider leaving CQEs still on the completion queue,
> > there is a non-zero probability that a completion will be lost.
> 
> What does lost mean?
> 
> The CQ is edge triggered, so if you don't drain it you might not get
> another timely CQ callback (which is bad), but CQEs themselves should
> not be lost.
> 

This condition (not fully draining the CQEs) is due to SQ flow control, yes?  If so, then when the SQ resumes can it wake up the appropriate thread (simulating another CQE insertion)?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html