Re: questions about the linux NFS 4.1 client and persistent sessions

bfields@xxxxxxxxxxxx (J. Bruce Fields) · Wed, 14 Oct 2020 15:26:59 -0400

On Sat, Oct 10, 2020 at 11:39:30PM +0300, guy keren wrote:
> during the design, we encountered some issues with high-availability
> and persistent sessions handling by the linux NFS client, and i
> would like to understand a few things about the linux NFS client - i
> read all relevant material on www.linux-nfs.org, and spent a while
> reading the relevant recovery code in the nfs4.1 client kernel
> sources, but i am missing some things (a pointer to the relevant
> part in the recovery code will be appreciated as well):
> 
> 
> 1. suppose there is a persistent session that got disconnected
> (because of a server restart, for example). i see that the client is
> re-sending all the in-flight commands as part of
> 
>     the recovery. however, suppose that one of the commands was a
> compound command containing 2 requests, and the reply to the first
> of them was NFS4_OK, and to the 2nd it was NFS4ERR_DELAY - will the
> client's code know that after it finishes recovery of the session -
> then when it creates a new session, it needs to re-send the 2nd
> request in this compound command?

If the client received the reply, it shouldn't have to resend the
compound at all.

If the client didn't see the reply, it will resend the whole compound.
Its behavior won't be affected by how the compound failed, since it
can't know that.

> the broader question is about a
> compound with N commands, where the first X have an NFS4_OK reply
> and the last N-X have NFS4_DELAY

The server always stops processing a compound at the first failure, so
N-X is always <=1.

> - will the client re-send a new
> compound with the last N-X commands after establishing a new
> session?

A resend by definition is a resend of exactly the same compound.  The
client won't break it into pieces in that way.

(And typical compounds can't be broken up that way anyway--often earlier
ops in the compound are things like PUTFH's that supply required
information to later ops.)

> 2. if there is a non-persistent session, on which the client sent a
> non-idempotent request (e.g. rename of a file into a different
> directory), and the server restarted before the client received the
> response - will the client just blindly re-send the same request
> again after establishing a new session, or will it take some
> measures to attempt to understand whether the command was already
> executed? i.e. if the server already executed the rename, then
> re-sending it will return a failure to locate the source file handle
> (because it moved to a new directory).

In a rename of A/X to B/Y, the source filehandle refers to the directory
"A", so that filehandle will still work.  You might get a NFS4ERR_NOENT
if there's nothing at A/X any more, and you could guess that meant the
rename succeeded.  But it could equally well be that your rename was
never executed, and it's somebody else's rename or unlink that caused
A/X to no longer exist.  Similarly, the A/X might have executed but
another operation might have immediately created something else at A/X.

> does the linux NFS client
> attempt to recover from this, or will it simply return an error to
> the application layer?

I suspect that's all any client does.  You can imagine all sorts of
complicated hueristics, but none of them will be 100% right.  Persistent
sessions is what you really need to fix this kind of bug.

> 3. what NFS server with persistent sessions is used (or was used)
> when testing the persistent sessions support in the linux NFS
> client? the linux NFS server, as far as i understood, cannot support
> persistent sessions (due to lack of assured persistent memory).

I don't think any special hardware is necessary.  Or if it is, we could
just disable the feature in the absence of that hardware.  Mainly what
we need is some cooperation from the filesystem--some way the can ID
particular operations so the server can ask the filesystem if a
particular operation was committed to disk.  I talked to the XFS
developers about it informally and they seemed open to the idea, but
they need some sort of explanation of the requirements and I haven't
gotten around to it....

--b.