Hi Bruce- > On Sep 9, 2016, at 5:18 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Wed, Sep 07, 2016 at 04:36:19PM -0400, Chuck Lever wrote: >> S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message >> whose sequence number lies outside the current window is dropped. >> The rationale is: >> >> The reason for discarding requests silently is that the server >> is unable to determine if the duplicate or out of range request >> was due to a sequencing problem in the client, network, or the >> operating system, or due to some quirk in routing, or a replay >> attack by an intruder. Discarding the request allows the client >> to recover after timing out, if indeed the duplication was >> unintentional or well intended. >> >> However, clients may rely on the server dropping the connection to >> indicate that a retransmit is needed. Without a connection reset, a >> client can wait forever without retransmitting, and the workload >> just stops dead. I've reproduced this behavior by running xfstests >> generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i. >> >> To address this issue, have the server close the connection when it >> silently discards an incoming message due to a GSS sequence number >> problem. >> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >> Cc: Benjamin Coddington <bcodding@xxxxxxxxxx> >> --- >> Hi- >> >> Passed testing with my reproducer: 10 runs of generic/323 with >> proto=rdma and sec=krb5i, with NFSv3, NFSv4.0, and NFSv4.1. >> generic/323 is 120 seconds or so of a heavy aio workload. >> >> I tested with that dprintk replaced with pr_warn to confirm that the >> reproducer hits this path one or more times per test run. > > Thanks, this is useful, but before applying I'd just like to audit other > uses of SVC_DROP in the server rpc code as this probably isn't the only > place with this problem. Consider this a test result, then. So, "I'd just like to audit" means you are doing the auditing now, or would you like me to dig into that? > Also, this changes behavior for v2/v3 too, does that cause any problems? > Is it OK for the server to just always close connections on dropping in > the v2/v3 case too? I've run the same tests with NFSv3 (NFS/RDMA + krb5i or krb5p) and did not see a negative impact. Not much, but there it is. What would provide more confidence that NFSv2/3 is not impacted? > --b. > >> >> net/sunrpc/auth_gss/svcauth_gss.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c >> index d858202..3ff52ec 100644 >> --- a/net/sunrpc/auth_gss/svcauth_gss.c >> +++ b/net/sunrpc/auth_gss/svcauth_gss.c >> @@ -696,7 +696,8 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, >> if (!gss_check_seq_num(rsci, gc->gc_seq)) { >> dprintk("RPC: svcauth_gss: discarding request with " >> "old sequence number %d\n", gc->gc_seq); >> - return SVC_DROP; >> + /* Signal to the client that an RPC message was lost */ >> + return SVC_CLOSE; >> } >> return SVC_OK; >> } > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html