Re: nfsv41: stale file handles after VM shutdown/poweron - but works during warm reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 25, 2024, at 8:09 AM, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> we're experiencing a weird NFS (client?) issue when we cold restart a Linux virtual
> machine running an NFS server. The client will reconnect when the server comes back
> but we end up with stale file handles and have to unmount/remount.
> 
> Initially I thought we did something stupid running the `soft` option and wanted
> to convert to `hard,timeo=6,retrans=10` but noticed that our original problems
> that caused me to investigate did not go away when changing the mount options.
> 
> The issue seems to be (surprisingly!) caused when restarting the NFS server VM
> using a "cold boot" approach: calling shutdown and then having the VM started
> with a fresh Qemu process. When restarting the NFS server VM using a warm
> reboot we do not see stale file handles. This appears not to be a race
> condition or timeouts happening as its reliably reproducable in either
> direction - I've tried dozens of times and it always works with a reboot and
> always ends up with stale file handles when shutting down/starting a fresh VM.
> 
> I've played around with shutting off the NFS server service and introducing
> delays before cold and warm restarts but the symptoms remain the same.
> 
> I've tried reading through man pages, kernel documentation, and kernel code
> but haven't found anything that points to me doing something wrong. The nfsd
> and client settings are all at their defaults in /proc (at least I didn't
> find any config management modifying them and peeking at a few of them they
> ran at their defaults).
> 
> The issue can be observed on 5.15.164 as well as 6.11. We've been running
> identical setups for many years and haven't observed this issue until a few
> months ago. We’ve been running with 5.15 for a while now and I think this
> might have been introduced somewhere along the way.
> 
> I've added the client side kernel traces here. Network traffic was not
> enlightening.
> 
> I tried chopping the traces into parts that reflect what happens on the server.
> 
> Setup
> =====
> 
> Server and client are both Qemu/KVM VMs running NixOS with a (for practical
> purposes) vanilla kernel (two one line patches that change the `bridge-stp` and
> `request-key` helper paths). 
> 
> Mount:
> 
> <server>:/srv/nfs/shared on /mnt/nfs/shared type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp6,timeo=100,retrans=30,sec=sys,clientaddr=<v6network>::1137,local_lock=none,addr=<v6network>::1136)
> 
> Export:
> 
> /srv/nfs/shared  <server>(rw,sync,root_squash,no_subtree_check) <client>(rw,sync,root_squash,no_subtree_check)
> 
> 
> Case 1 (cold reboot - ends up broken)
> =====================================
> 
> We're stopping the server, waiting for a bit and then poweroff. Basically:
> 
> server> systemctl stop nfs-server; sleep 30; poweroff
> 
> The poweroff causes the VM to be automatically restarted after a few seconds
> with a fresh Qemu process - but with all identical settings from the previous
> instance.
> 
> The client sees the following rpc and nfs trace:
> 
> server> systemctl stop nfs-server
> 
> client kernel log>
> [ 4965.503048] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 4965.503975] RPC:       state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
> [ 4965.505105] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 4965.505930] RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 4965.506712] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 4965.507469] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 4965.508261] RPC:       xs_close xprt 000000004f6bb8b0
> [ 4965.509108] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 4965.509916] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> 
> server> sleep 30
> 
> client kernel log>
> [ 4998.558930] nfs4_renew_state: start
> [ 4998.559497] <-- nfs41_proc_async_sequence status=0
> [ 4998.560074] nfs4_renew_state: done
> [ 4998.560473] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 4998.561379] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 4998.562442] encode_sequence: sessionid=1727262693:2026340752:2:0 seqid=47 slotid=0 max_slotid=0 cache_this=0
> [ 4998.563650] RPC:       xs_connect scheduled xprt 000000004f6bb8b0
> [ 4998.564385] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:788: ok (0)
> [ 4998.565372] RPC:       worker connecting xprt 000000004f6bb8b0 via tcp to <v6network>::1136 (port 2049)
> [ 4998.566606] RPC:       000000004f6bb8b0 connect status 115 connected 0 sock state 2
> [ 4998.567594] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 4998.567596] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 4998.567600] RPC:       xs_error_report client 000000004f6bb8b0, error=111...
> [ 4998.569979] RPC:       xs_close xprt 000000004f6bb8b0
> [ 4998.570565] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 4998.571276] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5001.630875] RPC:       xs_connect scheduled xprt 000000004f6bb8b0
> [ 5001.631969] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:745: ok (0)
> [ 5001.632890] RPC:       worker connecting xprt 000000004f6bb8b0 via tcp to <v6network>::1136 (port 2049)
> [ 5001.634262] RPC:       000000004f6bb8b0 connect status 115 connected 0 sock state 2
> [ 5001.635306] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5001.635308] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5001.635312] RPC:       xs_error_report client 000000004f6bb8b0, error=111...
> [ 5001.637871] RPC:       xs_close xprt 000000004f6bb8b0
> [ 5001.638465] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5001.639237] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5004.702871] RPC:       xs_connect scheduled xprt 000000004f6bb8b0
> [ 5004.703851] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:1012: ok (0)
> [ 5004.704995] RPC:       worker connecting xprt 000000004f6bb8b0 via tcp to <v6network>::1136 (port 2049)
> [ 5004.706752] RPC:       000000004f6bb8b0 connect status 115 connected 0 sock state 2
> [ 5004.708103] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5004.708106] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5004.708110] RPC:       xs_error_report client 000000004f6bb8b0, error=111...
> [ 5004.711656] RPC:       xs_close xprt 000000004f6bb8b0
> [ 5004.712489] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5004.713521] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5007.774811] RPC:       xs_connect scheduled xprt 000000004f6bb8b0
> [ 5007.775819] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:811: ok (0)
> [ 5007.776749] RPC:       worker connecting xprt 000000004f6bb8b0 via tcp to <v6network>::1136 (port 2049)
> [ 5007.778362] RPC:       000000004f6bb8b0 connect status 115 connected 0 sock state 2
> [ 5007.779413] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5007.779416] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5007.779420] RPC:       xs_error_report client 000000004f6bb8b0, error=111...
> [ 5007.782197] RPC:       xs_close xprt 000000004f6bb8b0
> [ 5007.782884] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5007.783757] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5010.846776] RPC:       xs_connect scheduled xprt 000000004f6bb8b0
> [ 5010.847884] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:920: ok (0)
> [ 5010.848923] RPC:       worker connecting xprt 000000004f6bb8b0 via tcp to <v6network>::1136 (port 2049)
> [ 5010.850346] RPC:       000000004f6bb8b0 connect status 115 connected 0 sock state 2
> [ 5010.851478] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5010.851480] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5010.851484] RPC:       xs_error_report client 000000004f6bb8b0, error=111...
> [ 5010.854398] RPC:       xs_close xprt 000000004f6bb8b0
> [ 5010.855067] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5010.855829] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5013.918740] RPC:       xs_connect scheduled xprt 000000004f6bb8b0
> [ 5013.919802] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:691: ok (0)
> [ 5013.920747] RPC:       worker connecting xprt 000000004f6bb8b0 via tcp to <v6network>::1136 (port 2049)
> [ 5013.922364] RPC:       000000004f6bb8b0 connect status 115 connected 0 sock state 2
> 
> server> poweroff
> 
> (server powers off and gets restarted)
> 
> client kernel log>
> 
> [ 5082.014329] RPC:       xs_tcp_state_change client 000000004f6bb8b0...
> [ 5082.015431] RPC:       state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
> [ 5082.016693] RPC:       xs_tcp_send_request(116) = 0
> [ 5082.032137] nfs41_sequence_process ERROR: -10052 Reset session
> [ 5082.033051] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5082.033883] nfs41_sequence_process: Error -10052 free the slot
> [ 5082.034713] nfs41_sequence_call_done ERROR -10052
> [ 5082.035406] nfs4_schedule_lease_recovery: scheduling lease recovery for server <server>
> [ 5082.036710] nfs41_sequence_call_done rpc_cred 000000001946c07d
> [ 5082.037565] RPC:       xs_tcp_send_request(100) = 0
> [ 5082.038660] NFS: Got error -10052 from the server on DESTROY_SESSION. Session has been destroyed regardless...
> [ 5082.040040] --> nfs4_proc_create_session clp=00000000872dbfe2 session=00000000b29d3016
> [ 5082.041071] nfs4_init_channel_attrs: Fore Channel : max_rqst_sz=1049620 max_resp_sz=1049480 max_ops=8 max_reqs=64
> [ 5082.042467] nfs4_init_channel_attrs: Back Channel : max_rqst_sz=4096 max_resp_sz=4096 max_resp_sz_cached=0 max_ops=2 max_reqs=16
> [ 5082.044069] nfs4_schedule_state_renewal: requeueing work. Lease period = 5
> [ 5082.045066] RPC:       xs_tcp_send_request(196) = 0
> [ 5082.046099] nfs4_reset_session: session reset failed with status -10022 for server <server>!
> [ 5082.047361] nfs4_handle_reclaim_lease_error: handled error -10022 for server <server>
> [ 5082.048601] RPC:       xs_tcp_send_request(244) = 0
> [ 5082.049712] --> nfs4_proc_create_session clp=00000000872dbfe2 session=00000000b29d3016
> [ 5082.050852] nfs4_init_channel_attrs: Fore Channel : max_rqst_sz=1049620 max_resp_sz=1049480 max_ops=8 max_reqs=64
> [ 5082.052201] nfs4_init_channel_attrs: Back Channel : max_rqst_sz=4096 max_resp_sz=4096 max_resp_sz_cached=0 max_ops=2 max_reqs=16
> [ 5082.053763] RPC:       xs_tcp_send_request(196) = 0
> [ 5082.055159] --> nfs4_setup_session_slot_tables
> [ 5082.055770] --> nfs4_realloc_slot_table: max_reqs=30, tbl->max_slots 30
> [ 5082.056681] nfs4_realloc_slot_table: tbl=00000000cf1ff1fa slots=0000000002cd9cdc max_slots=30
> [ 5082.057752] <-- nfs4_realloc_slot_table: return 0
> [ 5082.058328] --> nfs4_realloc_slot_table: max_reqs=16, tbl->max_slots 16
> [ 5082.059207] nfs4_realloc_slot_table: tbl=000000005982b642 slots=0000000017686be5 max_slots=16
> [ 5082.060307] <-- nfs4_realloc_slot_table: return 0
> [ 5082.060912] slot table setup returned 0
> [ 5082.061403] nfs4_proc_create_session client>seqid 2 sessionid 1727263096:1126483273:1:0
> [ 5082.062737] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5082.063857] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5082.064781] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=1 slotid=0 max_slotid=0 cache_this=0
> [ 5082.066063] RPC:       xs_tcp_send_request(132) = 0
> [ 5082.068110] decode_attr_lease_time: lease time=90
> [ 5082.068780] decode_attr_maxfilesize: maxfilesize=0
> [ 5082.069411] decode_attr_maxread: maxread=1024
> [ 5082.070006] decode_attr_maxwrite: maxwrite=1024
> [ 5082.070612] decode_attr_time_delta: time_delta=0 0
> [ 5082.071227] decode_attr_pnfstype: bitmap is 0
> [ 5082.071810] decode_attr_layout_blksize: bitmap is 0
> [ 5082.072452] decode_attr_clone_blksize: bitmap is 0
> [ 5082.073090] decode_attr_change_attr_type: bitmap is 0
> [ 5082.073735] decode_attr_xattrsupport: XATTR support=false
> [ 5082.074414] decode_fsinfo: xdr returned 0!
> [ 5082.075357] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5082.076299] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5082.077212] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5082.077909] nfs41_sequence_process: Error 0 free the slot
> [ 5082.078569] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5082.079571] nfs4_schedule_state_renewal: requeueing work. Lease period = 60
> [ 5082.080520] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5082.081506] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5082.082376] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=2 slotid=0 max_slotid=0 cache_this=0
> [ 5082.083612] RPC:       xs_tcp_send_request(124) = 0
> [ 5082.113634] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5082.114614] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5082.115482] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5082.116142] nfs41_sequence_process: Error 0 free the slot
> [ 5082.116772] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5082.117766] <-- nfs41_proc_reclaim_complete status=0
> [ 5082.118379] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=16
> [ 5082.119308] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5082.120110] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5082.120805] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5082.121690] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5082.122438] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> 
> The client now shows the following error when trying to access files on the
> NFS mountpoint (shared is a directory in /mnt/nfs that is the mountpoint):
> 
> client> ls /mnt/nfs/
> ls: cannot access '/mnt/nfs/shared': Stale file handle
> shared
> 
> While doing that, the client kernel log continues:
> 
> [ 5121.788081] NFS: nfs_weak_revalidate: inode 33681408 is valid
> [ 5121.788941] NFS: revalidating (0:50/33681408)
> [ 5121.789816] --> nfs41_call_sync_prepare data->seq_server 00000000862cf9f3
> [ 5121.790950] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5121.791958] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5121.792904] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=3 slotid=0 max_slotid=0 cache_this=0
> [ 5121.794289] RPC:       xs_tcp_send_request(172) = 0
> [ 5121.799020] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5121.800014] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5121.800876] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5121.801562] nfs41_sequence_process: Error 0 free the slot
> [ 5121.802232] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5121.803240] nfs_revalidate_inode: (0:50/33681408) getattr failed, error=-116
> [ 5121.804471] NFS: revalidating (0:50/33681408)
> [ 5121.805080] --> nfs41_call_sync_prepare data->seq_server 00000000862cf9f3
> [ 5121.805968] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5121.806955] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5121.807885] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=4 slotid=0 max_slotid=0 cache_this=0
> [ 5121.809522] RPC:       xs_tcp_send_request(172) = 0
> [ 5121.810845] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5121.811864] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5121.812791] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5121.813501] nfs41_sequence_process: Error 0 free the slot
> [ 5121.814207] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5121.815242] nfs_revalidate_inode: (0:50/33681408) getattr failed, error=-116
> [ 5121.816279] NFS: nfs_weak_revalidate: inode 33681408 is invalid
> [ 5121.817311] NFS: revalidating (0:50/33681408)
> [ 5121.828500] NFS: nfs_weak_revalidate: inode 33681408 is invalid
> [ 5146.013099] nfs4_renew_state: start
> [ 5146.013647] nfs4_renew_state: failed to call renewd. Reason: lease not expired
> [ 5146.014510] nfs4_schedule_state_renewal: requeueing work. Lease period = 36
> [ 5146.015356] nfs4_renew_state: done
> [ 5155.718411] NFS: nfs_weak_revalidate: inode 33681408 is valid
> [ 5155.719976] NFS: revalidating (0:50/33681408)
> [ 5155.720628] --> nfs41_call_sync_prepare data->seq_server 00000000862cf9f3
> [ 5155.721545] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5155.722519] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5155.726973] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=6 slotid=0 max_slotid=0 cache_this=0
> [ 5155.728990] RPC:       xs_tcp_send_request(172) = 0
> [ 5155.731070] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5155.732292] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5155.733384] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5155.734088] nfs41_sequence_process: Error 0 free the slot
> [ 5155.734867] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5155.741211] nfs_revalidate_inode: (0:50/33681408) getattr failed, error=-116
> [ 5155.742647] NFS: revalidating (0:50/33681408)
> [ 5155.748003] --> nfs41_call_sync_prepare data->seq_server 00000000862cf9f3
> [ 5155.748977] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5155.750066] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5155.754045] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=7 slotid=0 max_slotid=0 cache_this=0
> [ 5155.757104] RPC:       xs_tcp_send_request(172) = 0
> [ 5155.759310] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5155.760383] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5155.761233] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5155.761870] nfs41_sequence_process: Error 0 free the slot
> [ 5155.762585] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5155.768250] nfs_revalidate_inode: (0:50/33681408) getattr failed, error=-116
> [ 5155.769224] NFS: nfs_weak_revalidate: inode 33681408 is invalid
> [ 5155.771968] NFS: revalidating (0:50/33681408)
> [ 5155.772574] --> nfs41_call_sync_prepare data->seq_server 00000000862cf9f3
> [ 5155.773522] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5155.774379] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5155.779036] encode_sequence: sessionid=1727263096:1126483273:1:0 seqid=8 slotid=0 max_slotid=0 cache_this=0
> [ 5155.780960] RPC:       xs_tcp_send_request(172) = 0
> [ 5155.781797] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5155.782721] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5155.783734] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5155.784485] nfs41_sequence_process: Error 0 free the slot
> [ 5155.785084] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5155.789037] nfs_revalidate_inode: (0:50/33681408) getattr failed, error=-116
> [ 5155.789873] NFS: nfs_weak_revalidate: inode 33681408 is invalid
> 
> Unmounting and remounting the mountpoint puts the client into a working state again.
> 
> Case 2 (ends up in a working state)
> ===================================
> 
> Here, we also shutdown the NFS server, wait for a bit, but then perform a "warm" reboot.
> 
> server> systemctl stop nfs-server ; sleep 30 seconds; reboot system
> 
> The individual steps as seen from the client:
> 
> server> systemctl stop nfs-server
> 
> client kernel log>
> [ 5511.593152] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5511.593997] RPC:       state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
> [ 5511.594763] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5511.595581] RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5511.596371] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5511.597131] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5511.597957] RPC:       xs_close xprt 0000000005eb3fd3
> [ 5511.599868] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5511.600621] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> 
> server> sleep 30; reboot
> [ 5526.935766] nfs4_renew_state: start
> [ 5526.936259] nfs4_renew_state: failed to call renewd. Reason: lease not expired
> [ 5526.937164] nfs4_schedule_state_renewal: requeueing work. Lease period = 36
> [ 5526.938060] nfs4_renew_state: done
> [ 5563.799229] nfs4_renew_state: start
> [ 5563.799869] <-- nfs41_proc_async_sequence status=0
> [ 5563.800512] nfs4_renew_state: done
> [ 5563.800990] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5563.802030] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5563.803244] encode_sequence: sessionid=1727263096:1126483275:2:0 seqid=43 slotid=0 max_slotid=0 cache_this=0
> [ 5563.804487] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5563.805269] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:914: ok (0)
> [ 5563.806376] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5563.807676] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> [ 5578.071084] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5578.072089] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5578.072898] RPC:       xs_error_report client 0000000005eb3fd3, error=113...
> [ 5578.074168] RPC:       xs_close xprt 0000000005eb3fd3
> [ 5578.074806] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5578.075629] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5578.076533] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5578.077353] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:791: ok (0)
> [ 5578.078422] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5578.079767] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> [ 5581.143007] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5581.144027] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5581.144804] RPC:       xs_error_report client 0000000005eb3fd3, error=113...
> [ 5581.146013] RPC:       xs_close xprt 0000000005eb3fd3
> [ 5581.146667] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5581.147475] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5584.150918] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5584.151952] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:842: ok (0)
> [ 5584.153297] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5584.154840] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> [ 5587.223096] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5587.224119] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5587.224971] RPC:       xs_error_report client 0000000005eb3fd3, error=113...
> [ 5587.226296] RPC:       xs_close xprt 0000000005eb3fd3
> [ 5587.227011] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5587.227878] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5590.230822] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5590.231841] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:893: ok (0)
> [ 5590.232897] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5590.234566] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> [ 5593.302883] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5593.303910] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5593.304850] RPC:       xs_error_report client 0000000005eb3fd3, error=113...
> [ 5593.306104] RPC:       xs_close xprt 0000000005eb3fd3
> [ 5593.306799] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5593.307585] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5596.310840] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5596.311711] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:925: ok (0)
> [ 5596.312594] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5596.314192] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> 
> 
> The server is booting and the client recovers while logging this:
> 
> 
> [ 5602.390691] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5602.391719] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:980: ok (0)
> [ 5602.392668] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5602.394223] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> [ 5605.462614] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5605.463527] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5605.464312] RPC:       xs_error_report client 0000000005eb3fd3, error=113...
> [ 5605.465430] RPC:       xs_close xprt 0000000005eb3fd3
> [ 5605.466074] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5605.466903] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [ 5608.470590] RPC:       xs_connect scheduled xprt 0000000005eb3fd3
> [ 5608.471618] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:1015: ok (0)
> [ 5608.472730] RPC:       worker connecting xprt 0000000005eb3fd3 via tcp to 2a02:238:f030:1c3::1136 (port 2049)
> [ 5608.474014] RPC:       0000000005eb3fd3 connect status 115 connected 0 sock state 2
> [ 5609.495200] RPC:       xs_tcp_state_change client 0000000005eb3fd3...
> [ 5609.496171] RPC:       state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
> [ 5609.497356] RPC:       xs_tcp_send_request(116) = 0
> [ 5609.511115] nfs41_sequence_process ERROR: -10052 Reset session
> [ 5609.512033] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5609.512809] nfs41_sequence_process: Error -10052 free the slot
> [ 5609.513589] nfs41_sequence_call_done ERROR -10052
> [ 5609.514246] nfs4_schedule_lease_recovery: scheduling lease recovery for server <server>
> [ 5609.515446] nfs41_sequence_call_done rpc_cred 000000001946c07d
> [ 5609.516272] RPC:       xs_tcp_send_request(100) = 0
> [ 5609.517198] NFS: Got error -10052 from the server on DESTROY_SESSION. Session has been destroyed regardless...
> [ 5609.518645] --> nfs4_proc_create_session clp=0000000002fc3705 session=000000009488146a
> [ 5609.519601] nfs4_init_channel_attrs: Fore Channel : max_rqst_sz=1049620 max_resp_sz=1049480 max_ops=8 max_reqs=64
> [ 5609.520801] nfs4_init_channel_attrs: Back Channel : max_rqst_sz=4096 max_resp_sz=4096 max_resp_sz_cached=0 max_ops=2 max_reqs=16
> [ 5609.523115] nfs4_schedule_state_renewal: requeueing work. Lease period = 5
> [ 5609.524103] RPC:       xs_tcp_send_request(196) = 0
> [ 5609.525090] nfs4_reset_session: session reset failed with status -10022 for server <server>!
> [ 5609.526350] nfs4_handle_reclaim_lease_error: handled error -10022 for server <server>
> [ 5609.527548] RPC:       xs_tcp_send_request(244) = 0
> [ 5609.528414] --> nfs4_proc_create_session clp=0000000002fc3705 session=000000009488146a
> [ 5609.529500] nfs4_init_channel_attrs: Fore Channel : max_rqst_sz=1049620 max_resp_sz=1049480 max_ops=8 max_reqs=64
> [ 5609.530905] nfs4_init_channel_attrs: Back Channel : max_rqst_sz=4096 max_resp_sz=4096 max_resp_sz_cached=0 max_ops=2 max_reqs=16
> [ 5609.532526] RPC:       xs_tcp_send_request(196) = 0
> [ 5609.533669] --> nfs4_setup_session_slot_tables
> [ 5609.534365] --> nfs4_realloc_slot_table: max_reqs=30, tbl->max_slots 30
> [ 5609.535199] nfs4_realloc_slot_table: tbl=000000009c22d728 slots=0000000025e05bce max_slots=30
> [ 5609.536186] <-- nfs4_realloc_slot_table: return 0
> [ 5609.536744] --> nfs4_realloc_slot_table: max_reqs=16, tbl->max_slots 16
> [ 5609.537734] nfs4_realloc_slot_table: tbl=0000000002a05f6b slots=000000009aa80389 max_slots=16
> [ 5609.538775] <-- nfs4_realloc_slot_table: return 0
> [ 5609.539446] slot table setup returned 0
> [ 5609.539994] nfs4_proc_create_session client>seqid 2 sessionid 1727263639:3068450331:1:0
> [ 5609.541129] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5609.542047] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5609.543029] encode_sequence: sessionid=1727263639:3068450331:1:0 seqid=1 slotid=0 max_slotid=0 cache_this=0
> [ 5609.544295] RPC:       xs_tcp_send_request(132) = 0
> [ 5609.545479] decode_attr_lease_time: lease time=90
> [ 5609.546158] decode_attr_maxfilesize: maxfilesize=0
> [ 5609.546798] decode_attr_maxread: maxread=1024
> [ 5609.547357] decode_attr_maxwrite: maxwrite=1024
> [ 5609.547943] decode_attr_time_delta: time_delta=0 0
> [ 5609.548482] decode_attr_pnfstype: bitmap is 0
> [ 5609.549005] decode_attr_layout_blksize: bitmap is 0
> [ 5609.549534] decode_attr_clone_blksize: bitmap is 0
> [ 5609.550125] decode_attr_change_attr_type: bitmap is 0
> [ 5609.550766] decode_attr_xattrsupport: XATTR support=false
> [ 5609.551413] decode_fsinfo: xdr returned 0!
> [ 5609.552242] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5609.553039] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5609.553917] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5609.554541] nfs41_sequence_process: Error 0 free the slot
> [ 5609.555273] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5609.560764] nfs4_schedule_state_renewal: requeueing work. Lease period = 60
> [ 5609.562070] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5609.563074] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5609.563980] encode_sequence: sessionid=1727263639:3068450331:1:0 seqid=2 slotid=0 max_slotid=0 cache_this=0
> [ 5609.565352] RPC:       xs_tcp_send_request(124) = 0
> [ 5609.644135] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5609.645214] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5609.646049] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5609.646722] nfs41_sequence_process: Error 0 free the slot
> [ 5609.647369] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5609.648352] <-- nfs41_proc_reclaim_complete status=0
> [ 5609.648964] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=16
> [ 5609.649813] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5609.650544] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5609.651237] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5609.652134] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5609.652874] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5632.396766] NFS: permission(0:50/33681408), mask=0x81, res=-10
> [ 5632.397471] NFS: revalidating (0:50/33681408)
> [ 5632.397973] --> nfs41_call_sync_prepare data->seq_server 000000001a362fd8
> [ 5632.398705] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5632.399547] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5632.400600] encode_sequence: sessionid=1727263639:3068450331:1:0 seqid=3 slotid=0 max_slotid=0 cache_this=0
> [ 5632.401933] RPC:       xs_tcp_send_request(172) = 0
> [ 5632.406551] decode_attr_type: type=040000
> [ 5632.407276] decode_attr_change: change attribute=7
> [ 5632.407922] decode_attr_size: file size=18
> [ 5632.408428] decode_attr_fsid: fsid=(0x7de545f823d44d4e/0xadabff1eec0b0f80)
> [ 5632.409210] decode_attr_fileid: fileid=33681408
> [ 5632.409752] decode_attr_fs_locations: fs_locations done, error = 0
> [ 5632.410441] decode_attr_mode: file mode=0775
> [ 5632.410944] decode_attr_nlink: nlink=3
> [ 5632.411464] decode_attr_owner: uid=0
> [ 5632.411953] decode_attr_group: gid=900
> [ 5632.412473] decode_attr_rdev: rdev=(0x0:0x0)
> [ 5632.412982] decode_attr_space_used: space used=0
> [ 5632.413558] decode_attr_time_access: atime=1727245731
> [ 5632.414115] decode_attr_time_metadata: ctime=1727245730
> [ 5632.414746] decode_attr_time_modify: mtime=1727245730
> [ 5632.415336] decode_attr_mounted_on_fileid: fileid=33681408
> [ 5632.416084] decode_getfattr_attrs: xdr returned 0
> [ 5632.416712] decode_getfattr_generic: xdr returned 0
> [ 5632.417833] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5632.418779] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5632.419659] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5632.420315] nfs41_sequence_process: Error 0 free the slot
> [ 5632.421024] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5632.422037] NFS: nfs_update_inode(0:50/33681408 fh_crc=0xd3a763c6 ct=2 info=0x427e7f)
> [ 5632.423110] NFS: (0:50/33681408) revalidation complete
> [ 5632.423890] NFS: permission(0:50/33681408), mask=0x1, res=0
> [ 5632.424892] NFS: nfs_weak_revalidate: inode 33681408 is valid
> [ 5632.428895] NFS: permission(0:50/33681408), mask=0x81, res=0
> [ 5632.429693] NFS: revalidating (0:50/33681408)
> [ 5632.430402] --> nfs41_call_sync_prepare data->seq_server 000000001a362fd8
> [ 5632.431191] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5632.432112] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5632.433025] encode_sequence: sessionid=1727263639:3068450331:1:0 seqid=4 slotid=0 max_slotid=0 cache_this=0
> [ 5632.434228] RPC:       xs_tcp_send_request(172) = 0
> [ 5632.435149] decode_attr_type: type=040000
> [ 5632.435692] decode_attr_change: change attribute=7
> [ 5632.436237] decode_attr_size: file size=18
> [ 5632.436738] decode_attr_fsid: fsid=(0x7de545f823d44d4e/0xadabff1eec0b0f80)
> [ 5632.437518] decode_attr_fileid: fileid=33681408
> [ 5632.438052] decode_attr_fs_locations: fs_locations done, error = 0
> [ 5632.438796] decode_attr_mode: file mode=0775
> [ 5632.439312] decode_attr_nlink: nlink=3
> [ 5632.439772] decode_attr_owner: uid=0
> [ 5632.440147] decode_attr_group: gid=900
> [ 5632.440582] decode_attr_rdev: rdev=(0x0:0x0)
> [ 5632.441272] decode_attr_space_used: space used=0
> [ 5632.441998] decode_attr_time_access: atime=1727245731
> [ 5632.442796] decode_attr_time_metadata: ctime=1727245730
> [ 5632.443613] decode_attr_time_modify: mtime=1727245730
> [ 5632.444374] decode_attr_mounted_on_fileid: fileid=33681408
> [ 5632.445216] decode_getfattr_attrs: xdr returned 0
> [ 5632.445948] decode_getfattr_generic: xdr returned 0
> [ 5632.447188] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
> [ 5632.448043] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 5632.448876] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 5632.449551] nfs41_sequence_process: Error 0 free the slot
> [ 5632.450206] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 5632.451060] NFS: nfs_update_inode(0:50/33681408 fh_crc=0xd3a763c6 ct=2 info=0x427e7f)
> [ 5632.451966] NFS: (0:50/33681408) revalidation complete
> [ 5632.452663] NFS: nfs_weak_revalidate: inode 33681408 is valid
> [ 5632.453284] NFS: permission(0:50/33681408), mask=0x24, res=0
> [ 5632.454370] NFS: open dir(/)
> [ 5632.454787] NFS: readdir(/) starting at cookie 0
> [ 5632.455619] NFS: nfs_do_filldir() filling ended @ cookie 512
> [ 5632.456475] NFS: readdir(/) returns 0
> [ 5632.457038] NFS: permission(0:50/33681408), mask=0x81, res=0
> [ 5632.457862] NFS: revalidating (0:50/1677168)
> [ 5632.458536] --> nfs41_call_sync_prepare data->seq_server 000000001a362fd8
> [ 5632.459514] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
> [ 5632.460527] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 5632.461465] encode_sequence: sessionid=1727263639:3068450331:1:0 seqid=5 slotid=0 max_slotid=0 cache_this=0
> [ 5632.462847] RPC:       xs_tcp_send_request(184) = 0
> 
> 
> I’m at the point where I’m considering that this might be a bug.

I'm not entirely certain what you mean by "cold restart" versus
"warm restart" but for the moment I will assume that "cold restart"
means you reboot the NFS server host, and "warm restart" means
you simply cycle the NFS service (eg systemctl restart nfs-server).

STALE means the file handle no longer exists on the server. This
can mean the file system was unexported and thus is no longer
accessible.

In your case, I'm guessing that what is happening on a cold
restart is the exported file system is replaced; for example
a tmpfs. Or, maybe reboot removes exported files.


--
Chuck Lever






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux