> On Feb 13, 2021, at 5:10 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Sat, 2021-02-13 at 21:53 +0000, Chuck Lever wrote: >> Hi Trond- >> >>> On Feb 13, 2021, at 3:25 PM, trondmy@xxxxxxxxxx wrote: >>> >>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>> >>> Use a counter to keep track of how many requests are queued behind >>> the >>> xprt->xpt_mutex, and keep TCP_CORK set until the queue is empty. >> >> I'm intrigued, but IMO, the patch description needs to explain >> why this change should be made. Why abandon Nagle? >> > > This doesn't change the Nagle/TCP_NODELAY settings. It just switches to > using the new documented kernel interface. > > The only change is to use TCP_CORK so that we don't send out partially > filled TCP frames, when we can see that there are other RPC replies > that are queued up for transmission. > > Note the combination TCP_CORK+TCP_NODELAY is common, and the main > effect of the latter is that when we turn off the TCP_CORK, then there > is an immediate forced push of the TCP queue. The description above suggests the patch is just a clean-up, but a forced push has potential to change the server's behavior. I'm trying to assess what kind of additional testing would be valuable. >> If you expect a performance impact, the description should >> provide metrics to show it. > > I don't have a performance system to measure the improvement > accurately. Then let's have Daire try it out, if possible. > However I am seeing an notable improvement with the > equivalent client change. Specifically, xfstests generic/127 shows a > ~20% improvement on my VM based test rig. >> (We should have Daire try this change with his multi-client >> setup). >> >> >>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>> --- >>> include/linux/sunrpc/svcsock.h | 2 ++ >>> net/sunrpc/svcsock.c | 8 +++++++- >>> 2 files changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/sunrpc/svcsock.h >>> b/include/linux/sunrpc/svcsock.h >>> index b7ac7fe68306..bcc555c7ae9c 100644 >>> --- a/include/linux/sunrpc/svcsock.h >>> +++ b/include/linux/sunrpc/svcsock.h >>> @@ -35,6 +35,8 @@ struct svc_sock { >>> /* Total length of the data (not including fragment >>> headers) >>> * received so far in the fragments making up this rpc: */ >>> u32 sk_datalen; >>> + /* Number of queued send requests */ >>> + atomic_t sk_sendqlen; >> >> Can you take advantage of xpt_mutex: update this field >> only in the critical section, and make it a simple >> integer type? I see why atomic_inc(&sk_sendqlen); has to happen outside the mutex. However, it seems a little crazy/costly to have both of these serialization primitives. I haven't found a mutex_unlock() that will report that there are no waiters to wake up, though. >>> struct page * sk_pages[RPCSVC_MAXPAGES]; /* >>> received data */ >>> }; >>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>> index 5a809c64dc7b..231f510a4830 100644 >>> --- a/net/sunrpc/svcsock.c >>> +++ b/net/sunrpc/svcsock.c >>> @@ -1171,18 +1171,23 @@ static int svc_tcp_sendto(struct svc_rqst >>> *rqstp) >>> >>> svc_tcp_release_rqst(rqstp); >>> >>> + atomic_inc(&svsk->sk_sendqlen); >>> mutex_lock(&xprt->xpt_mutex); >>> if (svc_xprt_is_dead(xprt)) >>> goto out_notconn; >>> + tcp_sock_set_cork(svsk->sk_sk, true); >>> err = svc_tcp_sendmsg(svsk->sk_sock, &msg, xdr, marker, >>> &sent); >>> xdr_free_bvec(xdr); >>> trace_svcsock_tcp_send(xprt, err < 0 ? err : sent); >>> if (err < 0 || sent != (xdr->len + sizeof(marker))) >>> goto out_close; >>> + if (atomic_dec_and_test(&svsk->sk_sendqlen)) >>> + tcp_sock_set_cork(svsk->sk_sk, false); >>> mutex_unlock(&xprt->xpt_mutex); >>> return sent; >>> >>> out_notconn: >>> + atomic_dec(&svsk->sk_sendqlen); >>> mutex_unlock(&xprt->xpt_mutex); >>> return -ENOTCONN; >>> out_close: >>> @@ -1192,6 +1197,7 @@ static int svc_tcp_sendto(struct svc_rqst >>> *rqstp) >>> (err < 0) ? err : sent, xdr->len); >>> set_bit(XPT_CLOSE, &xprt->xpt_flags); >>> svc_xprt_enqueue(xprt); >>> + atomic_dec(&svsk->sk_sendqlen); >>> mutex_unlock(&xprt->xpt_mutex); >>> return -EAGAIN; >>> } >>> @@ -1261,7 +1267,7 @@ static void svc_tcp_init(struct svc_sock >>> *svsk, struct svc_serv *serv) >>> svsk->sk_datalen = 0; >>> memset(&svsk->sk_pages[0], 0, sizeof(svsk- >>>> sk_pages)); >>> >>> - tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF; >>> + tcp_sock_set_nodelay(sk); >>> >>> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >>> switch (sk->sk_state) { >>> -- >>> 2.29.2 >>> >> >> -- >> Chuck Lever >> >> >> > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx -- Chuck Lever