This document gives an overview of the LOCALIO protocol extension added to the Linux NFS client and server (both v3 and v4) to allow a client and server to reliably handshake to determine if they are on the same host. The LOCALIO protocol extension follows the well-worn pattern established by the ACL protocol extension. The robust handshake between local client and server is just the beginning, the ultimate use-case this locality makes possible is the client is able to issue reads, writes and commits directly to the server without having to go over the network. Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> --- Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++ include/linux/nfslocalio.h | 2 + 2 files changed, 103 insertions(+) create mode 100644 Documentation/filesystems/nfs/localio.rst diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst new file mode 100644 index 000000000000..4b4595037a7f --- /dev/null +++ b/Documentation/filesystems/nfs/localio.rst @@ -0,0 +1,101 @@ +=========== +NFS localio +=========== + +This document gives an overview of the LOCALIO protocol extension added +to the Linux NFS client and server (both v3 and v4) to allow a client +and server to reliably handshake to determine if they are on the same +host. The LOCALIO protocol extension follows the well-worn pattern +established by the ACL protocol extension. + +The LOCALIO protocol extension is needed to allow robust discovery of +clients local to their servers. Prior to this extension a fragile +sockaddr network address based match against all local network +interfaces was attempted. But unlike the LOCALIO protocol extension, +the sockaddr-based matching didn't handle use of iptables or containers. + +The robust handshake between local client and server is just the +beginning, the ultimate use-case this locality makes possible is the +client is able to issue reads, writes and commits directly to the server +without having to go over the network. This is particularly useful for +container usecases (e.g. kubernetes) where it is possible to run an IO +job local to the server. + +The performance advantage realized from localio's ability to bypass +using XDR and RPC for reads, writes and commits can be extreme, e.g.: +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8, +- With localio: + read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec) +- Without localio: + read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec) + +RPC +--- + +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows +the client to retrieve a server's uuid. LOCALIOPROC_GETUUID encodes the +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed +size opaque encode and decode XDR methods are used instead of the less +efficient variable sized methods. + +NFS Common and Server +--------------------- + +First use is in nfsd, to add access to a global nfsd_uuids list in +nfs_common that is used to register and then identify local nfsd +instances. + +nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is +composed of nfsd_uuid_t instances that are managed as nfsd creates them +(per network namespace). + +nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local +nfsd for the client specified nfsd uuid. + +The nfsd_uuids list is the basis for localio enablement, as such it has +members that point to nfsd memory for direct use by the client +(e.g. 'net' is the server's network namespace, through it the client can +access nn->nfsd_serv with proper rcu read access). It is this client +and server synchronization that enables advanced usage and lifetime of +objects to span from the host kernel's nfsd to per-container knfsd +instances that are connected to nfs client's running on the same local +host. + +NFS Client +---------- + +fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via +LOCALIO protocol and check if the server with that uuid is known to be +local. This ensures client and server 1: support localio 2: are local +to each other. + +See fs/nfs/localio.c:nfs_local_open_fh() and +fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes +focused use of nfsd_uuid_t struct to allow a client local to a server to +open a file pointer without needing to go over the network. + +The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the +server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access +both the nfsd network namespace and the associated nn->nfsd_serv in +terms of RCU. If nfsd_open_local_fh() finds that client no longer sees +valid nfsd objects (be it struct net or nn->nfsd_serv) it return ENXIO +to nfs_local_open_fh() and the client will try to reestablish the +LOCALIO resources needed by calling nfs_local_probe() again. This +recovery is needed if/when an nfsd instance running in a container were +to reboot while a localio client is connected to it. + +Testing +------- + +The LOCALIO protocol extension and associated NFS localio read, right +and commit access have proven stable against various test scenarios: + +- Client and server both on localhost (for both v3 and v4.2). + +- Various permutations of client and server support enablement for + both local and remote client and server. Testing against NFS storage + products that don't support the LOCALIO protocol was also performed. + +- Client on host, server within a container (for both v3 and v4.2) + The container testing was in terms of podman managed containers and + includes container stop/restart scenario. diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h index c9592ad0afe2..a9722e18b527 100644 --- a/include/linux/nfslocalio.h +++ b/include/linux/nfslocalio.h @@ -20,6 +20,8 @@ extern struct list_head nfsd_uuids; * Each nfsd instance has an nfsd_uuid_t that is accessible through the * global nfsd_uuids list. Useful to allow a client to negotiate if localio * possible with its server. + * + * See Documentation/filesystems/nfs/localio.rst for more detail. */ typedef struct { uuid_t uuid; -- 2.44.0