On Thu, Jun 20, 2024 at 10:33:15AM -0400, Mike Snitzer wrote: > On Thu, Jun 20, 2024 at 09:52:21AM -0400, Chuck Lever wrote: > > On Wed, Jun 19, 2024 at 04:40:31PM -0400, Mike Snitzer wrote: > > > This document gives an overview of the LOCALIO auxiliary RPC protocol > > > added to the Linux NFS client and server (both v3 and v4) to allow a > > > client and server to reliably handshake to determine if they are on the > > > same host. The LOCALIO auxiliary protocol's implementation, which uses > > > the same connection as NFS traffic, follows the pattern established by > > > the NFS ACL protocol extension. > > > > > > The robust handshake between local client and server is just the > > > beginning, the ultimate usecase this locality makes possible is the > > > client is able to issue reads, writes and commits directly to the server > > > without having to go over the network. This is particularly useful for > > > container usecases (e.g. kubernetes) where it is possible to run an IO > > > job local to the server. > > > > > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> > > > --- > > > Documentation/filesystems/nfs/localio.rst | 148 ++++++++++++++++++++++ > > > include/linux/nfslocalio.h | 2 + > > > 2 files changed, 150 insertions(+) > > > create mode 100644 Documentation/filesystems/nfs/localio.rst > > > > > > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst > > > new file mode 100644 > > > index 000000000000..a43c3dab2cab > > > --- /dev/null > > > +++ b/Documentation/filesystems/nfs/localio.rst > > > @@ -0,0 +1,148 @@ > > > +=========== > > > +NFS localio > > > +=========== > > > + > > > +This document gives an overview of the LOCALIO auxiliary RPC protocol > > > +added to the Linux NFS client and server (both v3 and v4) to allow a > > > +client and server to reliably handshake to determine if they are on the > > > +same host. The LOCALIO auxiliary protocol's implementation, which uses > > > +the same connection as NFS traffic, follows the pattern established by > > > +the NFS ACL protocol extension. > > > + > > > +The LOCALIO auxiliary protocol is needed to allow robust discovery of > > > +clients local to their servers. Prior to this LOCALIO protocol a > > > +fragile sockaddr network address based match against all local network > > > +interfaces was attempted. But unlike the LOCALIO protocol, the > > > +sockaddr-based matching didn't handle use of iptables or containers. > > > + > > > +The robust handshake between local client and server is just the > > > +beginning, the ultimate usecase this locality makes possible is the > > > +client is able to issue reads, writes and commits directly to the server > > > +without having to go over the network. This is particularly useful for > > > +container usecases (e.g. kubernetes) where it is possible to run an IO > > > +job local to the server. > > > + > > > +The performance advantage realized from localio's ability to bypass > > > +using XDR and RPC for reads, writes and commits can be extreme, e.g.: > > > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8, > > > +- With localio: > > > + read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec) > > > +- Without localio: > > > + read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec) > > > + > > > +RPC > > > +--- > > > + > > > +The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC > > > +method that allows the Linux nfs client to retrieve a Linux nfs server's > > > +uuid. This protocol isn't part of an IETF standard, nor does it need to > > > +be considering it is Linux-to-Linux auxiliary RPC protocol that amounts > > > +to an implementation detail. > > > + > > > +The GETUUID method encodes the server's uuid_t in terms of the fixed > > > +UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR > > > +methods are used instead of the less efficient variable sized methods. > > > + > > > +The RPC program number for the NFS_LOCALIO_PROGRAM is currently defined > > > +as 0x20000002 (but a request for a unique RPC program number assignment > > > +has been submitted to IANA.org). > > > + > > > +The following approximately describes the LOCALIO in a pseudo rpcgen .x > > > +syntax: > > > + > > > +#define UUID_SIZE 16 > > > +typedef u8 uuid_t<UUID_SIZE>; > > > + > > > +program NFS_LOCALIO_PROGRAM { > > > + version NULLVERS { > > > + void NULL(void) = 0; > > > + } = 1; > > > + version GETUUIDVERS { > > > + uuid_t GETUUID(void) = 1; > > > + } = 1; > > > +} = 0x20000002; > > > + > > > +The above is the skeleton for the LOCALIO protocol, it doesn't account > > > +for NFS v3 and v4 RPC boilerplate (which also marshalls RPC status) that > > > +is used to implement GETUUID. > > > + > > > +Here are the respective XDR results for nfsd and nfs: > > > > Hi Mike! > > > > A protocol spec describes the on-the-wire data formats, not the > > in-memory structure layouts. The below C structures are not > > relevant to this specification. This should be all you need here, > > if I understand your protocol correctly: > > > > /* raw RFC 9562 UUID */ > > #define UUID_SIZE 16 > > typedef u8 uuid_t<UUID_SIZE>; > > > > union GETUUID1res switch (uint32 status) { > > case 0: > > uuid_t uuid; > > default: > > void; > > }; > > > > program NFS_LOCALIO_PROGRAM { > > version LOCALIO_V1 { > > void > > NULL(void) = 0; > > > > GETUUID1res > > GETUUID(void) = 1; > > } = 1; > > } = 0x20000002; > > Thanks for this, nice to see I wasn't too far off. > > > Then you need to discuss transport considerations: > > > > - Whether this protocol is registered with the server's rpcbind > > service, > > It isn't, should it be? Not familiar with what needs updating to do > it, but happy to work through it. Well the issue is whether a client can assume that LOCALIO will always be running on a fixed port. Which, IIUC, it will be. So I don't think registration is needed. The protocol spec needs to state that the LOCALIO server port is fixed, and that makes rpcbind registration optional. > > - Which TCP/UDP port number does it use? Assuming 2049, and that > > it will appear on the same transport connection as NFS traffic > > (just like NFACL). > > Correct. > > > Should it be supported on port 20049 with RDMA as well? > > Unless there is some additional code needed, I don't see why it > wouldn't. But I haven't tested it (will look at NFS's RDMA support > and wrap my head around it). Head-wrapping NFS/RDMA is a multi-year project :-) You probably do want to have LOCALIO available for NFS/RDMA connections. I'm not sure that requires extra code. I don't recall clearly, but I think there isn't anything extra done for NFSACL, for example. > > > +Testing > > > +------- > > > + > > > +The LOCALIO auxiliary protocol and associated NFS localio read, right > > > +and commit access have proven stable against various test scenarios but > > > +these have not yet been formalized in any testsuite: > > > > Is there anywhere that describes what is needed to set up clients > > and a server to do local I/O? Then running the usual suite of NFS > > tests on that set up and comparing the nfsstat output on the local > > and remote clients should be a basic "smoke test" kind of thing > > that maintainers can use as a check-in test. > > I just figured running nfsd and nfs client connecting to that > localhost was obvious. But I can fill in more howto like info in this > section. > > What is "the usual suite of NFS tests"? I should run them ;) Start with the cthon04 suite. We all seem to use fstests too. There are some others, but these should be sufficient for your purposes. -- Chuck Lever