TestStateID woes with recent clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

Time-to-time we're plagued with a lot of "TestStateID" RPC calls on a
4.15.0-88 (Ubuntu Bionic) kernel, where clients (~310 VMS) are using
either 4.19.106 or 4.19.107 (Flatcar Linux). What we see during these
"storms", is that _some_ clients are testing the same id for callback
like

[Thu Apr  9 15:18:57 2020] NFS reply test_stateid: succeeded, 0
[Thu Apr  9 15:18:57 2020] NFS call  test_stateid 00000000ec5d02eb
[Thu Apr  9 15:18:57 2020] --> nfs41_call_sync_prepare
data->seq_server 000000006dfc86c9
[Thu Apr  9 15:18:57 2020] --> nfs4_alloc_slot used_slots=0000
highest_used=4294967295 max_slots=31
[Thu Apr  9 15:18:57 2020] <-- nfs4_alloc_slot used_slots=0001
highest_used=0 slotid=0
[Thu Apr  9 15:18:57 2020] encode_sequence:
sessionid=1585584999:2538115180:5741:0 seqid=13899229 slotid=0
max_slotid=0 cache_this=0
[Thu Apr  9 15:18:57 2020] nfs41_handle_sequence_flag_errors:
"10.1.4.65" (client ID 671b825e6c904897) flags=0x00000040
[Thu Apr  9 15:18:57 2020] --> nfs4_alloc_slot used_slots=0001
highest_used=0 max_slots=31
[Thu Apr  9 15:18:57 2020] <-- nfs4_alloc_slot used_slots=0003
highest_used=1 slotid=1
[Thu Apr  9 15:18:57 2020] nfs4_free_slot: slotid 1 highest_used_slotid 0
[Thu Apr  9 15:18:57 2020] nfs41_sequence_process: Error 0 free the slot
[Thu Apr  9 15:18:57 2020] nfs4_free_slot: slotid 0
highest_used_slotid 4294967295
[Thu Apr  9 15:18:57 2020] NFS reply test_stateid: succeeded, 0
[Thu Apr  9 15:18:57 2020] NFS call  test_stateid 00000000ec5d02eb
[Thu Apr  9 15:18:57 2020] --> nfs41_call_sync_prepare
data->seq_server 000000006dfc86c9
[Thu Apr  9 15:18:57 2020] --> nfs4_alloc_slot used_slots=0000
highest_used=4294967295 max_slots=31
[Thu Apr  9 15:18:57 2020] <-- nfs4_alloc_slot used_slots=0001
highest_used=0 slotid=0
[Thu Apr  9 15:18:57 2020] encode_sequence:
sessionid=1585584999:2538115180:5741:0 seqid=13899230 slotid=0
max_slotid=0 cache_this=0
[Thu Apr  9 15:18:57 2020] nfs41_handle_sequence_flag_errors:
"10.1.4.65" (client ID 671b825e6c904897) flags=0x00000040
[Thu Apr  9 15:18:57 2020] --> nfs4_alloc_slot used_slots=0001
highest_used=0 max_slots=31
[Thu Apr  9 15:18:57 2020] <-- nfs4_alloc_slot used_slots=0003
highest_used=1 slotid=1
[Thu Apr  9 15:18:57 2020] nfs4_free_slot: slotid 1 highest_used_slotid 0
[Thu Apr  9 15:18:57 2020] nfs41_sequence_process: Error 0 free the slot
[Thu Apr  9 15:18:57 2020] nfs4_free_slot: slotid 0
highest_used_slotid 4294967295
[Thu Apr  9 15:18:57 2020] NFS reply test_stateid: succeeded, 0

Due to this, some processes on some clients are stuck and these nodes
need to be rebooted. Initially, we thought we're facing the issue that
was fixed in 44f411c353bf, but as I see we're already using a kernel
where it was backported to via 90d73c1cadb8.

Clients are mounting as
"rw,nosuid,nodev,noexec,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,proto=tcp,timeo=600,retrans=2,sec=sys"

Export options are the following
"<world>(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,fsid=762,sec=sys,rw,secure,no_root_squash,no_all_squash)"
(fsid varies from export to export of course)

Our workload is super metadata heavy (PHP) and data being served
changes a lot as clients are uploading files etc.

We have a similar setup where clients are 4.19.(6|7)8 (CoreOS) and the
server is 4.15.0-76, where we rarely see these TestID RPC calls.

It's worth to mention that between the two setups that is okay and the
one that is not, the main difference is using different block size
(the one with 512byte is okay, the other one with 4k isn't) in the
backing filesystem (ZFS), although I'm unsure how would that affect
NFS at all.

The issue manifests at least once every day.

Can you please point me in a direction that I should check further?

Doma



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux