On Mon, Apr 20, 2020 at 04:32:27PM +0200, sea you wrote: > Time-to-time we're plagued with a lot of "TestStateID" RPC calls on a > 4.15.0-88 (Ubuntu Bionic) kernel, where clients (~310 VMS) are using > either 4.19.106 or 4.19.107 (Flatcar Linux). What we see during these > "storms", is that _some_ clients are testing the same id for callback > like ... > Due to this, some processes on some clients are stuck and these nodes > need to be rebooted. Initially, we thought we're facing the issue that > was fixed in 44f411c353bf, but as I see we're already using a kernel > where it was backported to via 90d73c1cadb8. > > Clients are mounting as > "rw,nosuid,nodev,noexec,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,proto=tcp,timeo=600,retrans=2,sec=sys" > > Export options are the following > "<world>(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,fsid=762,sec=sys,rw,secure,no_root_squash,no_all_squash)" Sorry for the derail, but the "async" export option is almost a never good idea. > Can you please point me in a direction that I should check further? Off the top of my head, my only suggestion is to retest if possible on the latest upstream kernel. --b.