Re: [PATCH v6 17/18] nfs: add Documentation/filesystems/nfs/localio.rst

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 20, 2024 at 10:33:15AM -0400, Mike Snitzer wrote:
> On Thu, Jun 20, 2024 at 09:52:21AM -0400, Chuck Lever wrote:
> > On Wed, Jun 19, 2024 at 04:40:31PM -0400, Mike Snitzer wrote:
> > > This document gives an overview of the LOCALIO auxiliary RPC protocol
> > > added to the Linux NFS client and server (both v3 and v4) to allow a
> > > client and server to reliably handshake to determine if they are on the
> > > same host.  The LOCALIO auxiliary protocol's implementation, which uses
> > > the same connection as NFS traffic, follows the pattern established by
> > > the NFS ACL protocol extension.
> > > 
> > > The robust handshake between local client and server is just the
> > > beginning, the ultimate usecase this locality makes possible is the
> > > client is able to issue reads, writes and commits directly to the server
> > > without having to go over the network.  This is particularly useful for
> > > container usecases (e.g. kubernetes) where it is possible to run an IO
> > > job local to the server.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> > > ---
> > >  Documentation/filesystems/nfs/localio.rst | 148 ++++++++++++++++++++++
> > >  include/linux/nfslocalio.h                |   2 +
> > >  2 files changed, 150 insertions(+)
> > >  create mode 100644 Documentation/filesystems/nfs/localio.rst
> > > 
> > > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > > new file mode 100644
> > > index 000000000000..a43c3dab2cab
> > > --- /dev/null
> > > +++ b/Documentation/filesystems/nfs/localio.rst
> > > @@ -0,0 +1,148 @@
> > > +===========
> > > +NFS localio
> > > +===========
> > > +
> > > +This document gives an overview of the LOCALIO auxiliary RPC protocol
> > > +added to the Linux NFS client and server (both v3 and v4) to allow a
> > > +client and server to reliably handshake to determine if they are on the
> > > +same host.  The LOCALIO auxiliary protocol's implementation, which uses
> > > +the same connection as NFS traffic, follows the pattern established by
> > > +the NFS ACL protocol extension.
> > > +
> > > +The LOCALIO auxiliary protocol is needed to allow robust discovery of
> > > +clients local to their servers.  Prior to this LOCALIO protocol a
> > > +fragile sockaddr network address based match against all local network
> > > +interfaces was attempted.  But unlike the LOCALIO protocol, the
> > > +sockaddr-based matching didn't handle use of iptables or containers.
> > > +
> > > +The robust handshake between local client and server is just the
> > > +beginning, the ultimate usecase this locality makes possible is the
> > > +client is able to issue reads, writes and commits directly to the server
> > > +without having to go over the network.  This is particularly useful for
> > > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > > +job local to the server.
> > > +
> > > +The performance advantage realized from localio's ability to bypass
> > > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > > +-  With localio:
> > > +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > > +-  Without localio:
> > > +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > > +
> > > +RPC
> > > +---
> > > +
> > > +The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
> > > +method that allows the Linux nfs client to retrieve a Linux nfs server's
> > > +uuid.  This protocol isn't part of an IETF standard, nor does it need to
> > > +be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
> > > +to an implementation detail.
> > > +
> > > +The GETUUID method encodes the server's uuid_t in terms of the fixed
> > > +UUID_SIZE (16 bytes).  The fixed size opaque encode and decode XDR
> > > +methods are used instead of the less efficient variable sized methods.
> > > +
> > > +The RPC program number for the NFS_LOCALIO_PROGRAM is currently defined
> > > +as 0x20000002 (but a request for a unique RPC program number assignment
> > > +has been submitted to IANA.org).
> > > +
> > > +The following approximately describes the LOCALIO in a pseudo rpcgen .x
> > > +syntax:
> > > +
> > > +#define UUID_SIZE 16
> > > +typedef u8 uuid_t<UUID_SIZE>;
> > > +
> > > +program NFS_LOCALIO_PROGRAM {
> > > +     version NULLVERS {
> > > +        void NULL(void) = 0;
> > > +	} = 1;
> > > +     version GETUUIDVERS {
> > > +        uuid_t GETUUID(void) = 1;
> > > +	} = 1;
> > > +} = 0x20000002;
> > > +
> > > +The above is the skeleton for the LOCALIO protocol, it doesn't account
> > > +for NFS v3 and v4 RPC boilerplate (which also marshalls RPC status) that
> > > +is used to implement GETUUID.
> > > +
> > > +Here are the respective XDR results for nfsd and nfs:
> > 
> > Hi Mike!
> > 
> > A protocol spec describes the on-the-wire data formats, not the
> > in-memory structure layouts. The below C structures are not
> > relevant to this specification. This should be all you need here,
> > if I understand your protocol correctly:
> > 
> > /* raw RFC 9562 UUID */
> > #define UUID_SIZE 16
> > typedef u8 uuid_t<UUID_SIZE>;
> > 
> > union GETUUID1res switch (uint32 status) {
> > case 0:
> >     uuid_t  uuid;
> > default:
> >     void;
> > };
> > 
> > program NFS_LOCALIO_PROGRAM {
> >     version LOCALIO_V1 {
> >         void
> >             NULL(void) = 0;
> > 
> >         GETUUID1res
> >             GETUUID(void) = 1;
> >     } = 1;
> > } = 0x20000002;
> 
> Thanks for this, nice to see I wasn't too far off.
> 
> > Then you need to discuss transport considerations:
> > 
> > - Whether this protocol is registered with the server's rpcbind
> >   service,
> 
> It isn't, should it be?  Not familiar with what needs updating to do
> it, but happy to work through it.

Well the issue is whether a client can assume that LOCALIO will
always be running on a fixed port. Which, IIUC, it will be. So I
don't think registration is needed. The protocol spec needs to
state that the LOCALIO server port is fixed, and that makes
rpcbind registration optional.


> > - Which TCP/UDP port number does it use? Assuming 2049, and that
> >   it will appear on the same transport connection as NFS traffic
> >   (just like NFACL).
> 
> Correct.
>  
> > Should it be supported on port 20049 with RDMA as well?
> 
> Unless there is some additional code needed, I don't see why it
> wouldn't.  But I haven't tested it (will look at NFS's RDMA support
> and wrap my head around it).

Head-wrapping NFS/RDMA is a multi-year project :-) 

You probably do want to have LOCALIO available for NFS/RDMA
connections. I'm not sure that requires extra code. I don't recall
clearly, but I think there isn't anything extra done for NFSACL,
for example.


> > > +Testing
> > > +-------
> > > +
> > > +The LOCALIO auxiliary protocol and associated NFS localio read, right
> > > +and commit access have proven stable against various test scenarios but
> > > +these have not yet been formalized in any testsuite:
> > 
> > Is there anywhere that describes what is needed to set up clients
> > and a server to do local I/O? Then running the usual suite of NFS
> > tests on that set up and comparing the nfsstat output on the local
> > and remote clients should be a basic "smoke test" kind of thing
> > that maintainers can use as a check-in test.
> 
> I just figured running nfsd and nfs client connecting to that
> localhost was obvious.  But I can fill in more howto like info in this
> section.
> 
> What is "the usual suite of NFS tests"?  I should run them ;)

Start with the cthon04 suite. We all seem to use fstests too. There
are some others, but these should be sufficient for your purposes.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux