On Wed, 2017-12-13 at 20:13 -0500, Chuck Lever wrote: > > On Dec 13, 2017, at 6:42 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> > > wrote: > > > > > > > > On Dec 13, 2017, at 11:14 AM, Chuck Lever <chuck.lever@xxxxxxxxxx > > > > wrote: > > > > > > Hi Trond- > > > > > > > On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebust@pr > > > > imarydata.com> wrote: > > > > > > > > We must ensure that the call to rpc_sleep_on() in > > > > xprt_transmit() cannot > > > > race with the call to xprt_complete_rqst(). > > > > > > > > Reported-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 > > > > Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to > > > > protect..") > > > > Cc: stable@xxxxxxxxxxxxxxx # 4.14+ > > > > Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx > > > > > > > > > --- > > > > net/sunrpc/xprt.c | 26 +++++++++++++++++--------- > > > > 1 file changed, 17 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > > > > index 333b9d697ae5..5e4278e9ce37 100644 > > > > --- a/net/sunrpc/xprt.c > > > > +++ b/net/sunrpc/xprt.c > > > > @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task *task) > > > > } else if (!req->rq_bytes_sent) > > > > return; > > > > > > > > + req->rq_connect_cookie = xprt->connect_cookie; > > > > req->rq_xtime = ktime_get(); > > > > status = xprt->ops->send_request(task); > > > > trace_xprt_transmit(xprt, req->rq_xid, status); > > > > @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task > > > > *task) > > > > xprt->stat.bklog_u += xprt->backlog.qlen; > > > > xprt->stat.sending_u += xprt->sending.qlen; > > > > xprt->stat.pending_u += xprt->pending.qlen; > > > > + spin_unlock_bh(&xprt->transport_lock); > > > > > > > > - /* Don't race with disconnect */ > > > > - if (!xprt_connected(xprt)) > > > > - task->tk_status = -ENOTCONN; > > > > - else { > > > > + if (rpc_reply_expected(task) && !READ_ONCE(req- > > > > >rq_reply_bytes_recvd)) { > > > > /* > > > > - * Sleep on the pending queue since > > > > - * we're expecting a reply. > > > > + * Sleep on the pending queue if we're > > > > expecting a reply. > > > > + * The spinlock ensures atomicity between the > > > > test of > > > > + * req->rq_reply_bytes_recvd, and the call to > > > > rpc_sleep_on(). > > > > */ > > > > - if (!req->rq_reply_bytes_recvd && > > > > rpc_reply_expected(task)) > > > > + spin_lock(&xprt->recv_lock); > > > > + if (!req->rq_reply_bytes_recvd) { > > > > rpc_sleep_on(&xprt->pending, task, > > > > xprt_timer); > > > > - req->rq_connect_cookie = xprt->connect_cookie; > > > > + /* > > > > + * Send an extra queue wakeup call if > > > > the > > > > + * connection was dropped in case the > > > > call to > > > > + * rpc_sleep_on() raced. > > > > + */ > > > > + if (!xprt_connected(xprt)) > > > > + xprt_wake_pending_tasks(xprt, > > > > -ENOTCONN); > > > > + } > > > > + spin_unlock(&xprt->recv_lock); > > > > } > > > > - spin_unlock_bh(&xprt->transport_lock); > > > > } > > > > > > > > static void xprt_add_backlog(struct rpc_xprt *xprt, struct > > > > rpc_task *task) > > > > -- > > > > 2.14.3 > > > > > > I've run into a problem with this version of the patch (on v4.15- > > > rc3). > > > > > > With dbench on NFSv4.0 (on RDMA) I no longer see constant > > > throughput > > > and a temporarily climbing latency followed by a recovery, but > > > this > > > can happen on occasion: > > > > > > releasing clients > > > 16 584 322.55 MB/sec warmup 1 sec latency 807.121 ms > > > 16 584 161.28 MB/sec warmup 2 sec latency 1807.175 ms > > > 16 584 107.52 MB/sec warmup 3 sec latency 2807.198 ms > > > 16 584 80.64 MB/sec warmup 4 sec latency 3807.217 ms > > > 16 584 64.51 MB/sec warmup 5 sec latency 4807.235 ms > > > 16 584 53.76 MB/sec warmup 6 sec latency 5807.251 ms > > > 16 584 46.08 MB/sec warmup 7 sec latency 6807.269 ms > > > 16 584 40.32 MB/sec warmup 8 sec latency 7807.285 ms > > > 16 584 35.84 MB/sec warmup 9 sec latency 8807.303 ms > > > 16 584 32.26 MB/sec warmup 10 sec latency 9807.318 ms > > > 16 584 29.32 MB/sec warmup 11 sec latency 10807.334 > > > ms > > > 16 584 26.88 MB/sec warmup 12 sec latency 11807.355 > > > ms > > > 16 584 24.81 MB/sec warmup 13 sec latency 12807.372 > > > ms > > > 16 584 23.04 MB/sec warmup 14 sec latency 13807.387 > > > ms > > > 16 584 21.50 MB/sec warmup 15 sec latency 14807.406 > > > ms > > > 16 584 20.16 MB/sec warmup 16 sec latency 15807.423 > > > ms > > > > > > No recovery. The latency number keeps climbing, and throughput > > > drops. > > > > > > "kernel: nfs: server klimt-ib not responding, still trying" > > > appears in the client's /var/log/messages. > > > > > > Without this patch applied, the NFSv4.0 behavior is the same as I > > > reported with NFSv3: every once in a while, one RPC completion is > > > lost, but when the timer fires, the client notices the reply > > > actually did arrive and the RPC completes normally. > > > > > > I'm looking into it. > > > > Root cause: > > > > The transport is temporarily out of resources and returns -ENOBUFS > > from ->send_request. > > > > The FSM schedules a delay then calls call_transmit again, which > > invokes xprt_prepare_transmit. > > > > On NFSv3 mounts, xprt_prepare_transmit tries to acquire the > > transport write lock. > > > > On NFSv4 mounts, RPC_TASK_NO_RETRANS_TIMEOUT is set. This causes > > xprt_prepare_transmit to see that the transport is connected and > > rq_connect_cookie is the same as xprt->connect_cookie, so it > > queues the task on ->pending and returns false. > > > > The transport never sends the Call, but still waits for a Reply, > > forever. This is not a bug in your patch, it is a pre-existing > > issue. > > Sorry for one more reply... > > I think this last statement is incorrect. It is a bug in your > patch after all. > > I forgot that your patch moves the line that sets > > req->rq_connect_cookie = xprt->connect_cookie; > > to before the call to ->send_request. But rq_connect_cookie > must be set iff ->send_request is successful. Moving this > line seems to fix the issue: > > 1027 req->rq_xtime = ktime_get(); > 1028 status = xprt->ops->send_request(task); > 1029 trace_xprt_transmit(xprt, req->rq_xid, status); > 1030 if (status != 0) { > 1031 task->tk_status = status; > 1032 return; > 1033 } > 1034 req->rq_connect_cookie = xprt->connect_cookie; <<<< > 1035 xprt_inject_disconnect(xprt); > Let's rather apply the attached fix (incremental to the original patch and intended to be applied using 'git rebase --autosquash'). The reason for wanting to set the connect cookie before we call send_request() is to avoid races with disconnects that happen immediately after we sent the data (but before we get round to setting the cookie). 8<-------------------------------------------------------- >From 326c12727b179d5c2fd42f374a4c0e2d56ee90c5 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Thu, 14 Dec 2017 07:05:27 -0500 Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path --- net/sunrpc/xprt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 5e4278e9ce37..2cf4cf21a878 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1021,10 +1021,11 @@ void xprt_transmit(struct rpc_task *task) /* Turn off autodisconnect */ del_singleshot_timer_sync(&xprt->timer); } + if (!req->rq_bytes_sent) + req->rq_connect_cookie = xprt->connect_cookie; } else if (!req->rq_bytes_sent) return; - req->rq_connect_cookie = xprt->connect_cookie; req->rq_xtime = ktime_get(); status = xprt->ops->send_request(task); trace_xprt_transmit(xprt, req->rq_xid, status); -- 2.14.3 -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥