Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API

Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> · Fri, 24 Jul 2015 10:26:57 -0600

On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:

> Unfinished, but operational:
> 
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future

Nice..

Can you spend some time and reflect on how some of this could be
lowered into the core code? The FMR and FRWR side have many
similarities now..

> FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
> but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
> read and write are negatively impacted.

I'm not surprised since invalidate is sync. I belive you need to
incorporate SEND WITH INVALIDATE to substantially recover this
overhead.

It would be neat if the RQ could continue to advance while waiting for
the invalidate.. That looks almost doable..

> I converted the RPC reply handler tasklet to a work queue context
> to allow sleeping. A new .ro_unmap_sync method is invoked after
> the RPC/RDMA header is parsed but before xprt_complete_rqst()
> wakes up the waiting RPC.

.. so the issue is the RPC must be substantially parsed to learn which
MR it is associated with to schedule the invalidate? 

> This is actually much more efficient than the current logic,
> which serially does an ib_unmap_fmr() for each MR the RPC owns.
> So FMR overall performs better with this change.

Interesting..

> Because the next RPC cannot awaken until the last send completes,
> send queue accounting is based on RPC/RDMA credit flow control.

So for FRWR the sync invalidate effectively guarentees all SQEs
related to this RPC are flushed. That seems reasonable, if the number
of SQEs and CQEs are properly sized in relation to the RPC slot count
it should be workable..

How does FMR and PHYS synchronize?

> I’m sure there are some details here that still need to be
> addressed, but this fixes the big problem with FRWR send queue
> accounting, which was that LOCAL_INV WRs would continue to
> consume SQEs while another RPC was allowed to start.

Did you test without that artificial limit you mentioned before?

I'm also wondering about this:

> During some other testing I found that when a completion upcall
> returns to the provider leaving CQEs still on the completion queue,
> there is a non-zero probability that a completion will be lost.

What does lost mean?

The CQ is edge triggered, so if you don't drain it you might not get
another timely CQ callback (which is bad), but CQEs themselves should
not be lost.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html