Re: GSS sequence number window

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 31 May 2017 16:27:23 -0400

> On May 31, 2017, at 3:22 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> 
> On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote:
>> On 30 May 2017, at 15:34, J. Bruce Fields wrote:
>> 
>>> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
>>>> Hey Bruce!
>>>> 
>>>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of
>>>> spurious connection loss, especially when I wanted to run a
>>>> CPU-intensive workload on my NFS server at the same time I
>>>> was testing.
>>>> 
>>>> I added a pr_err() in gss_check_seq_num, and ran a fio job
>>>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a
>>>> tmpfs). On the server, I rebuilt a kernel source tree cscope
>>>> database at the same time.
>>>> 
>>>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num =
>>>> 250098, sd_max = 250291, GSS_SEQ_WIN = 128
>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>> 937816, sd_max = 938171, GSS_SEQ_WIN = 128
>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>> 938544, sd_max = 938727, GSS_SEQ_WIN = 128
>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>> 938543, sd_max = 938727, GSS_SEQ_WIN = 128
>>>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num =
>>>> 939344, sd_max = 939549, GSS_SEQ_WIN = 128
>>>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num =
>>>> 965007, sd_max = 965176, GSS_SEQ_WIN = 128
>>>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
>>>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
>>>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
>>>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
>>>> 
>>>> RFC 2203 suggests there's no risk to using a large window.
>>>> My first thought was to make the sequence window larger
>>>> (say 2048) but I've seen stragglers outside even that large
>>>> a window.
>>>> 
>>>> Any thoughts about why there are these sequence number
>>>> outliers?
>>> 
>>> No, alas.
>> 
>> I noticed some slow allocations on the server with krb5 last year - but
>> never got around to doing anything about it:
>> http://marc.info/?t=146032122900006&r=1&w=2
>> 
>> Could be the same thing?
> 
> I don't think it would be too hard to eliminate the need for allocations
> there.  Or maybe there's even a quick hack that would let Chuck test
> whether that's the problem (different GFP flags on those allocations?)

This doesn't feel like a memory allocator issue. My server has
loads of free memory, and there's just one NUMA node.

Since increasing nfsd thread count seems to improve the situation,
my guess is that having more threads available makes it less likely
that an RPC request will be reordered.

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html