Any thoughts or suggestions on this issue? Thanks On Mon, Sep 12, 2022 at 7:29 AM Pradeep <pradeepthomas@xxxxxxxxx> wrote: > > Hello, > > We have created a referral-based “distributed” namespace spread across > multiple Linux nodes using the local file system. The server > implementation uses nfs-ganesha. Within an NFSv4 export, we place > top-level directories (under export root) on different nodes for “load > balancing”. These directories simulate separate filesystems (separate > fsids) and return fs_locations when queried by the client. Note that > currently, the Linux NFS client only queries for fs_locations > attribute (handling of NFS4ERR_MOVED) on a “lookup” VFS operation. > > To allow the client to be able to create top-level directories, we > must ensure that the mkdir succeeds (since the client cannot handle > NFS4ERR_MOVED on CREATE), but the directory created to be potentially > placed on a different node than the one client is connecting to. To do > this, we internally forward the mkdir request to another node where it > gets created. > > Until recently, we were able to return a zero-byte filehandle in the > GETFH response for CREATE (a compound of PUTFH, CREATE, GETFH, > GETATTR). This forces the Linux client to issue a LOOKUP on the > directory name, get an NFS4ERR_MOVED in response and subsequently get > redirected to the correct node. > > See code here for reference that forces a LOOKUP on zero byte FH: > https://elixir.bootlin.com/linux/latest/source/fs/nfs/dir.c#L2237 > > Of course, this isn’t documented in the protocol, although the > protocol does not explicitly disallow “junctions” to be created by the > client. > > Trond’s recent change on file handle validation in RPC decoding layer > now makes the zero-byte file handle invalid - see > https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=eb3d58c68e39fad68d8054e0324eb06d82dcedbb;hp=3d66bae156a652be18e278f3c88bc3e069ae824b. > > This is probably done (correctly) for other reasons. However, it does > prevent the client from “recovering” the filehandle of a newly created > directory through a subsequent lookup. Other options like returning a > correct filehandle (which we could obtain from the remote node) would > not work, since the client is not informed of fs_locations. Similarly, > NFS4ERR_BADHANDLE or NFS4ERR_STALE will not perform any “recovery” > since the directory isn’t known yet to have a separate fsid. > > Suggestions on how/if we can support junction creates from the client? > > Thanks, > Pradeep