Re: [PATCH 0/5] nfs: Add mount option for forcing RPC requests for one file over one connection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2021-03-24 at 09:23 -0400, Tom Talpey wrote:
> On 3/23/2021 12:14 PM, Chuck Lever III wrote:
> > 
> > 
> > > On Mar 23, 2021, at 11:57 AM, Nagendra Tomar < 
> > > Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
> > > 
> > > > > On Mar 23, 2021, at 1:46 AM, Nagendra Tomar
> > > > <Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > From: Nagendra S Tomar <natomar@xxxxxxxxxxxxx>
> > > > > 
> > > > > If a clustered NFS server is behind an L4 loadbalancer the
> > > > > default
> > > > > nconnect roundrobin policy may cause RPC requests to a file
> > > > > to be
> > > > > sent to different cluster nodes. This is because the source
> > > > > port
> > > > > would be different for all the nconnect connections.
> > > > > While this should functionally work (since the cluster will
> > > > > usually
> > > > > have a consistent view irrespective of which node is serving
> > > > > the
> > > > > request), it may not be desirable from performance pov. As an
> > > > > example we have an NFSv3 frontend to our Object store, where
> > > > > every
> > > > > NFSv3 file is an object. Now if writes to the same file are
> > > > > sent
> > > > > roundrobin to different cluster nodes, the writes become very
> > > > > inefficient due to the consistency requirement for object
> > > > > update
> > > > > being done from different nodes.
> > > > > Similarly each node may maintain some kind of cache to serve
> > > > > the file
> > > > > data/metadata requests faster and even in that case it helps
> > > > > to have
> > > > > a xprt affinity for a file/dir.
> > > > > In general we have seen such scheme to scale very well.
> > > > > 
> > > > > This patch introduces a new rpc_xprt_iter_ops for using an
> > > > > additional
> > > > > u32 (filehandle hash) to affine RPCs to the same file to one
> > > > > xprt.
> > > > > It adds a new mount option "ncpolicy=roundrobin|hash" which
> > > > > can be
> > > > > used to select the nconnect multipath policy for a given
> > > > > mount and
> > > > > pass the selected policy to the RPC client.
> > > > 
> > > > This sets off my "not another administrative knob that has
> > > > to be tested and maintained, and can be abused" allergy.
> > > > 
> > > > Also, my "because connections are shared by mounts of the same
> > > > server, all those mounts will all adopt this behavior"
> > > > rhinitis.
> > > 
> > > Yes, it's fair to call this out, but ncpolicy behaves like the
> > > nconnect
> > > parameter in this regards.
> > > 
> > > > And my "why add a new feature to a legacy NFS version" hives.
> > > > 
> > > > 
> > > > I agree that your scenario can and should be addressed somehow.
> > > > I'd really rather see this done with pNFS.
> > > > 
> > > > Since you are proposing patches against the upstream NFS
> > > > client,
> > > > I presume all your clients /can/ support NFSv4.1+. It's the NFS
> > > > servers that are stuck on NFSv3, correct?
> > > 
> > > Yes.
> > > 
> > > > 
> > > > The flexfiles layout can handle an NFSv4.1 client and NFSv3
> > > > data
> > > > servers. In fact it was designed for exactly this kind of mix
> > > > of
> > > > NFS versions.
> > > > 
> > > > No client code change will be necessary -- there are a lot more
> > > > clients than servers. The MDS can be made to work smartly in
> > > > concert with the load balancer, over time; or it can adopt
> > > > other
> > > > clever strategies.
> > > > 
> > > > IMHO pNFS is the better long-term strategy here.
> > > 
> > > The fundamental difference here is that the clustered NFSv3
> > > server
> > > is available over a single virtual IP, so IIUC even if we were to
> > > use
> > > NFSv41 with flexfiles layout, all it can handover to the client
> > > is that single
> > > (load-balanced) virtual IP and now when the clients do connect to
> > > the
> > > NFSv3 DS we still have the same issue. Am I understanding you
> > > right?
> > > Can you pls elaborate what you mean by "MDS can be made to work
> > > smartly in concert with the load balancer"?
> > 
> > I had thought there were multiple NFSv3 server targets in play.
> > 
> > If the load balancer is making them look like a single IP address,
> > then take it out of the equation: expose all the NFSv3 servers to
> > the clients and let the MDS direct operations to each data server.
> > 
> > AIUI this is the approach (without the use of NFSv3) taken by
> > NetApp next generation clusters.
> 
> It certainly sounds like the load balancer is actually performing a
> storage router function here, and roundrobin is going to thrash that
> badly. I'm not sure that exposing a magic "hash" knob is a very good
> solution though. Pushing decisions to the sysadmin is rarely a great
> approach.
> 
> Why not simply argue that "hash" is the better algorithm, and prove
> that it be the default? Is that not the case?
> 
> 

It's not, no. So we're not making that a default.

Pushing all the I/O to a single file through a single TCP only makes
sense if you have a setup like the one Nagendra is describing.
Otherwise, you're better off spreading it across multiple connections
(assuming that you have multiple NICs).

> Tom.
> 
> > > > > It adds a new rpc_procinfo member p_fhhash, which can be
> > > > > supplied
> > > > > by the specific RPC programs to return a u32 hash of the
> > > > > file/dir the
> > > > > RPC is targetting, and lastly it provides p_fhhash
> > > > > implementation
> > > > > for various NFS v3/v4/v41/v42 RPCs to generate the hash
> > > > > correctly.
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > Thanks,
> > > > > Tomar
> > > > > 
> > > > > Nagendra S Tomar (5):
> > > > > SUNRPC: Add a new multipath xprt policy for xprt selection
> > > > > based
> > > > >    on target filehandle hash
> > > > > SUNRPC/NFSv3/NFSv4: Introduce "enum ncpolicy" to represent
> > > > > the
> > > > nconnect
> > > > >    policy and pass it down from mount option to rpc layer
> > > > > SUNRPC/NFSv4: Rename RPC_TASK_NO_ROUND_ROBIN ->
> > > > RPC_TASK_USE_MAIN_XPRT
> > > > > NFSv3: Add hash computation methods for NFSv3 RPCs
> > > > > NFSv4: Add hash computation methods for NFSv4/NFSv42 RPCs
> > > > > 
> > > > > fs/nfs/client.c                      |   3 +
> > > > > fs/nfs/fs_context.c                  |  26 ++
> > > > > fs/nfs/internal.h                    |   2 +
> > > > > fs/nfs/nfs3client.c                  |   4 +-
> > > > > fs/nfs/nfs3xdr.c                     | 154 +++++++++++
> > > > > fs/nfs/nfs42xdr.c                    | 112 ++++++++
> > > > > fs/nfs/nfs4client.c                  |  14 +-
> > > > > fs/nfs/nfs4proc.c                    |  18 +-
> > > > > fs/nfs/nfs4xdr.c                     | 516
> > > > > ++++++++++++++++++++++++++++++-----
> > > > > fs/nfs/super.c                       |   7 +-
> > > > > include/linux/nfs_fs_sb.h            |   1 +
> > > > > include/linux/sunrpc/clnt.h          |  15 +
> > > > > include/linux/sunrpc/sched.h         |   2 +-
> > > > > include/linux/sunrpc/xprtmultipath.h |   9 +-
> > > > > include/trace/events/sunrpc.h        |   4 +-
> > > > > net/sunrpc/clnt.c                    |  38 ++-
> > > > > net/sunrpc/xprtmultipath.c           |  91 +++++-
> > > > > 17 files changed, 913 insertions(+), 103 deletions(-)
> > > > 
> > > > --
> > > > Chuck Lever
> > 
> > --
> > Chuck Lever
> > 
> > 
> > 
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux