Re: GSS sequence number window

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Wed, 31 May 2017 15:22:31 -0400

On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote:
> On 30 May 2017, at 15:34, J. Bruce Fields wrote:
> 
> >On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
> >>Hey Bruce!
> >>
> >>While testing with sec=krb5 and sec=krb5i, I noticed a lot of
> >>spurious connection loss, especially when I wanted to run a
> >>CPU-intensive workload on my NFS server at the same time I
> >>was testing.
> >>
> >>I added a pr_err() in gss_check_seq_num, and ran a fio job
> >>on a vers=3,sec=sys,proto=tcp mount (server is exporting a
> >>tmpfs). On the server, I rebuilt a kernel source tree cscope
> >>database at the same time.
> >>
> >>May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num =
> >>250098, sd_max = 250291, GSS_SEQ_WIN = 128
> >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
> >>937816, sd_max = 938171, GSS_SEQ_WIN = 128
> >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
> >>938544, sd_max = 938727, GSS_SEQ_WIN = 128
> >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
> >>938543, sd_max = 938727, GSS_SEQ_WIN = 128
> >>May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num =
> >>939344, sd_max = 939549, GSS_SEQ_WIN = 128
> >>May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num =
> >>965007, sd_max = 965176, GSS_SEQ_WIN = 128
> >>May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num =
> >>1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
> >>May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num =
> >>1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
> >>May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num =
> >>1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
> >>May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num =
> >>1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
> >>
> >>RFC 2203 suggests there's no risk to using a large window.
> >>My first thought was to make the sequence window larger
> >>(say 2048) but I've seen stragglers outside even that large
> >>a window.
> >>
> >>Any thoughts about why there are these sequence number
> >>outliers?
> >
> >No, alas.
> 
> I noticed some slow allocations on the server with krb5 last year - but
> never got around to doing anything about it:
> http://marc.info/?t=146032122900006&r=1&w=2
> 
> Could be the same thing?

I don't think it would be too hard to eliminate the need for allocations
there.  Or maybe there's even a quick hack that would let Chuck test
whether that's the problem (different GFP flags on those allocations?)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html