On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote: > Unfinished, but operational: > > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future Nice.. Can you spend some time and reflect on how some of this could be lowered into the core code? The FMR and FRWR side have many similarities now.. > FRWR is seeing a 10-15% throughput reduction with 8-thread dbench, > but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct > read and write are negatively impacted. I'm not surprised since invalidate is sync. I belive you need to incorporate SEND WITH INVALIDATE to substantially recover this overhead. It would be neat if the RQ could continue to advance while waiting for the invalidate.. That looks almost doable.. > I converted the RPC reply handler tasklet to a work queue context > to allow sleeping. A new .ro_unmap_sync method is invoked after > the RPC/RDMA header is parsed but before xprt_complete_rqst() > wakes up the waiting RPC. .. so the issue is the RPC must be substantially parsed to learn which MR it is associated with to schedule the invalidate? > This is actually much more efficient than the current logic, > which serially does an ib_unmap_fmr() for each MR the RPC owns. > So FMR overall performs better with this change. Interesting.. > Because the next RPC cannot awaken until the last send completes, > send queue accounting is based on RPC/RDMA credit flow control. So for FRWR the sync invalidate effectively guarentees all SQEs related to this RPC are flushed. That seems reasonable, if the number of SQEs and CQEs are properly sized in relation to the RPC slot count it should be workable.. How does FMR and PHYS synchronize? > I’m sure there are some details here that still need to be > addressed, but this fixes the big problem with FRWR send queue > accounting, which was that LOCAL_INV WRs would continue to > consume SQEs while another RPC was allowed to start. Did you test without that artificial limit you mentioned before? I'm also wondering about this: > During some other testing I found that when a completion upcall > returns to the provider leaving CQEs still on the completion queue, > there is a non-zero probability that a completion will be lost. What does lost mean? The CQ is edge triggered, so if you don't drain it you might not get another timely CQ callback (which is bad), but CQEs themselves should not be lost. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html