On Nov 9, 2014, at 5:13 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote: > On 11/9/2014 3:14 AM, Chuck Lever wrote: >> Recent work made FRMR registration and invalidation completions >> unsignaled. This greatly reduces the adapter interrupt rate. >> >> Every so often, however, a posted send Work Request is allowed to >> signal. Otherwise, the provider's Work Queue will wrap and the >> workload will hang. >> >> The number of Work Requests that are allowed to remain unsignaled is >> determined by the value of req_cqinit. Currently, this is set to the >> size of the send Work Queue divided by two, minus 1. >> >> For FRMR, the send Work Queue is the maximum number of concurrent >> RPCs (currently 32) times the maximum number of Work Requests an >> RPC might use (currently 7, though some adapters may need more). >> >> For mlx4, this is 224 entries. This leaves completion signaling >> disabled for 111 send Work Requests. >> >> Some providers hold back dispatching Work Requests until a CQE is >> generated. If completions are disabled, then no CQEs are generated >> for quite some time, and that can stall the Work Queue. >> >> I've seen this occur running xfstests generic/113 over NFSv4, where >> eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM >> because the Work Queue has overflowed. The connection is dropped >> and re-established. > > Hey Chuck, > > As you know, I've seen this issue too... > Looking into this is definitely on my todo list. > > Does this happen if you run a simple dd (single request-response inflight)? Hi Sagi- I typically run dbench, iozone, and xfstests when preparing patches for upstream. The generic/113 test I mention in the patch description is the only test where I saw this issue. I expect single-thread won’t drive enough Work Queue activity to push the provider into WQ overflow. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html