Random NFS client lockups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am experiencing random NFS client lockups after 1-2 days.  Kernel
reports

| nfs: server XXXXXXXXX not responding, timed out

processes are in D state and only a reboot helps.

The full log is available at https://pastebin.pl/view/7d0b345b


I can see one oddity there: shortly before the timeouts, log shows at
05:07:28:

| worker connecting xprt 0000000022aecad1 via tcp to XXXX:2001:1022:: (port 2049)
| 0000000022aecad1 connect status 0 connected 0 sock state 8

All other connects go in EINPROGRESS first

| 0000000022aecad1 connect status 115 connected 0 sock state 2
| ...
| state 8 conn 1 dead 0 zapped 1 sk_shutdown 1


After 'status 0', rpcdebug shows (around 05:07:43)

| --> nfs4_alloc_slot used_slots=03ff highest_used=9 max_slots=30
| <-- nfs4_alloc_slot used_slots=07ff highest_used=10 slotid=10
| ...
| <-- nfs4_alloc_slot used_slots=fffffff highest_used=27 slotid=27
| --> nfs4_alloc_slot used_slots=fffffff highest_used=27 max_slots=30
| ...
| --> nfs4_alloc_slot used_slots=3fffffff highest_used=29 max_slots=30
| <-- nfs4_alloc_slot used_slots=3fffffff highest_used=29 slotid=4294967295
| nfs41_sequence_process: Error 1 free the slot 

and nfs server times out then.


The server reports nearly at this time

| Mar 16 05:02:40 kernel: rpc-srv/tcp: nfsd: got error -32 when sending 112 bytes - shutting down socket

Similar message (with other sizes and sometime error -104) appear
frequently without a related client lockup.


How can I debug this further resp. solve it?


It happens (at least) with:

- a Fedora 35 client with kernel 5.16.7,
  kernel-5.17.0-0.rc7.20220310git3bf7edc84a9e.119.fc37.x86_64 and some
  other between them

- a Rocky Linux 8,5 server with kernel-4.18.0-348.12.2
  and kernel-4.18.0-348.2.1

Problem started after a power outage were whole infrastructure rebooted.
I ran the setup with kernel 5.16.7 on client and 4.18.0-348.2.1 on server
without problems before the outage.


Issue affects a /home directory mounted with

| XXXX:/home /home nfs4 rw,seclabel,nosuid,nodev,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,soft,posix,proto=tcp6,timeo=600,retrans=2,sec=krb5p,clientaddr=XXXX:2001:...,local_lock=none,addr=XXXX:2001:1022:: 0 0

Happens also without the "soft" option.

There are applications like firefox or chromium running which held locks
and access the filesystem frequently.


Logfile was created when rpcdebug enabled

| nfs        proc xdr root callback client mount pnfs pnfs_ld state
| rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache




Enrico



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux