Re: [PATCH] nfs: simplify and guarantee owner uniqueness.

"NeilBrown" <neilb@xxxxxxx> · Tue, 17 Sep 2024 09:32:36 +1000

On Tue, 17 Sep 2024, Steven Price wrote:
> 
> Hi Neil,
> 
> I'm seeing issues on a test board using an NFS root which I've bisected
> to this commit in linux-next. The kernel spits out many errors of the form:
> 
> [    7.478995] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.599462] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.600570] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.615243] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.636756] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.644808] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.653605] NFS: v4 server <ip>  returned a bad sequence-id error!
> [    7.692836] NFS: nfs4_reclaim_open_state: unhandled error -10026
> [    7.699573] NFSv4: state recovery failed for open file
> arm-linux-gnueabihf/libgpg-error.so.0.29.0, error = -10026
> [    7.711055] NFSv4: state recovery failed for open file
> arm-linux-gnueabihf/libgpg-error.so.0.29.0, error = -10026
> 
> (with the filename obviously varying)
> 
> The NFS server is a standard Debian 12 system.
> 
> Any ideas?

Not immediately.  It appears that when the client opens a file during
recovery, the server doesn't like the seqid that it uses...

Recover happens when the server restarts and when the client and server
have been out of contact for an extended period or time (>90 seconds by
default).
Was either of those the case here?  Which one?

Are you able to capture a network packet trace leading up to and
including these errors?  Something like:

   tcpdump -i any -s 0 -w /tmp/nfs.pcap port 2049

on the client (or server), then run the test which triggers the errors,
then interrupt the tcpdump.
Hopefully the nfs.pcap won't be too big and you can compress it and
email it to me.  Hopefully it will contain some useful hints.

Thanks for the report,
NeilBrown