EREMOTE error on NFSv4 CREATE RPC with latest kernel.

Pradeep <pradeepthomas@xxxxxxxxx> · Mon, 12 Sep 2022 07:29:14 -0700

Hello,

We have created a referral-based “distributed” namespace spread across
multiple Linux nodes using the local file system. The server
implementation uses nfs-ganesha. Within an NFSv4 export, we place
top-level directories (under export root) on different nodes for “load
balancing”. These directories simulate separate filesystems (separate
fsids) and return fs_locations when queried by the client. Note that
currently, the Linux NFS client only queries for fs_locations
attribute (handling of NFS4ERR_MOVED) on a “lookup” VFS operation.

To allow the client to be able to create top-level directories, we
must ensure that the mkdir succeeds (since the client cannot handle
NFS4ERR_MOVED on CREATE), but the directory created to be potentially
placed on a different node than the one client is connecting to. To do
this, we internally forward the mkdir request to another node where it
gets created.

Until recently, we were able to return a zero-byte filehandle in the
GETFH response for CREATE (a compound of PUTFH, CREATE, GETFH,
GETATTR). This forces the Linux client to issue a LOOKUP on the
directory name, get an NFS4ERR_MOVED in response and subsequently get
redirected to the correct node.

See code here for reference that forces a LOOKUP on zero byte FH:
https://elixir.bootlin.com/linux/latest/source/fs/nfs/dir.c#L2237

Of course, this isn’t documented in the protocol, although the
protocol does not explicitly disallow “junctions” to be created by the
client.

Trond’s recent change on file handle validation in RPC decoding layer
now makes the zero-byte file handle invalid - see
https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=eb3d58c68e39fad68d8054e0324eb06d82dcedbb;hp=3d66bae156a652be18e278f3c88bc3e069ae824b.

This is probably done (correctly) for other reasons. However, it does
prevent the client from “recovering” the filehandle of a newly created
directory through a subsequent lookup. Other options like returning a
correct filehandle (which we could obtain from the remote node) would
not work, since the client is not informed of fs_locations. Similarly,
NFS4ERR_BADHANDLE or NFS4ERR_STALE will not perform any “recovery”
since the directory isn’t known yet to have a separate fsid.

Suggestions on how/if we can support junction creates from the client?

Thanks,
Pradeep