On Tue, 17 Sep 2024, Steven Price wrote: > > Hi Neil, > > I'm seeing issues on a test board using an NFS root which I've bisected > to this commit in linux-next. The kernel spits out many errors of the form: > > [ 7.478995] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.599462] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.600570] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.615243] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.636756] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.644808] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.653605] NFS: v4 server <ip> returned a bad sequence-id error! > [ 7.692836] NFS: nfs4_reclaim_open_state: unhandled error -10026 > [ 7.699573] NFSv4: state recovery failed for open file > arm-linux-gnueabihf/libgpg-error.so.0.29.0, error = -10026 > [ 7.711055] NFSv4: state recovery failed for open file > arm-linux-gnueabihf/libgpg-error.so.0.29.0, error = -10026 > > (with the filename obviously varying) > > The NFS server is a standard Debian 12 system. > > Any ideas? Not immediately. It appears that when the client opens a file during recovery, the server doesn't like the seqid that it uses... Recover happens when the server restarts and when the client and server have been out of contact for an extended period or time (>90 seconds by default). Was either of those the case here? Which one? Are you able to capture a network packet trace leading up to and including these errors? Something like: tcpdump -i any -s 0 -w /tmp/nfs.pcap port 2049 on the client (or server), then run the test which triggers the errors, then interrupt the tcpdump. Hopefully the nfs.pcap won't be too big and you can compress it and email it to me. Hopefully it will contain some useful hints. Thanks for the report, NeilBrown