On Fri, Oct 01, 2021 at 06:50:12PM +0200, Daniel Kobras wrote: > Hi Bruce! > > Am 01.10.21 um 15:59 schrieb J. Bruce Fields: > > From: "J. Bruce Fields" <bfields@xxxxxxxxxx> > > > > If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number > > whenever sd_max is less than GSS_SEQ_WIN, and the comparison: > > > > seq_num <= sd->sd_max - GSS_SEQ_WIN > > > > in gss_check_seq_num is pretty much always true, even when that's > > clearly not what was intended. > > > > This was causing pynfs to hang when using krb5, because pynfs uses zero > > as the initial gss sequence number. That's perfectly legal, but this > > logic error causes knfsd to drop the rpc in that case. Out-of-order > > sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this. > > > > Fixes: 10b9d99a3dbb ("SUNRPC: Augment server-side rpcgss tracepoints") > > I wonder about the Fixes tag: That changeset added tracepoints to the > exit path, but the buggy logic seems to have been present since the > pre-git ages. Or am I missing something about 10b9d99a3dbb? The relevant parts of 10b9d99a3dbb were: struct gss_svc_seq_data { /* highest seq number seen so far: */ - int sd_max; + u32 sd_max; and -gss_check_seq_num(struct rsc *rsci, int seq_num) +static bool gss_check_seq_num(const struct svc_rqst *rqstp, struct rsc *rsci, + u32 seq_num) Together, they mean the comparison seq_num <= sd->sd_max - GSS_SEQ_WIN in the case sd_max is zero, effectively ends up being seq_num <= 4294967168 instead of what was intended, seq_num <= -128 . > (This might explain some reports of--as you stated elsewhere--"once in > a blue moon my krb5 mounts hang" we've investigated, albeit on kernels > that predate 10b9d99a3dbb.) Sounds like it was something else, I'm afraid! --b.