On Wed, Dec 14, 2011 at 09:49:20AM -0500, Jeff Layton wrote: > On Wed, 14 Dec 2011 09:35:57 -0500 > Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Dec 14, 2011, at 8:54 AM, Jeff Layton wrote: > > > > > First, a little background: I've recently been tasked with a project > > > to make active/active serving of NFSv4 from clustered filesystems work. > > > This is a large-scale, long-term project, but there are pieces of the > > > existing code that are clearly unsuitable in such a configuration... > > > > > > One of the things that Bruce has long had on his wishlist is to replace > > > the client name tracking code that the kernel uses: > > > > > > http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery > > > > > > The existing code manipulates the filesystem directly to track this > > > info. Not only is that something that makes the VFS maintainers look > > > askance at knfsd, but it also is unsuitable in a clustered > > > configuration. > > > > > > Typically we think of the grace period as a property of the server, but > > > with a clustered filesystem, we need to consider it as a property of the > > > cluster as a whole. On a cold startup of the cluster, once any node > > > grants a non-reclaim lock, then no more reclaim can be allowed on any > > > node. Grace periods must be coordinated amongst all cluster nodes. > > > > Agreed, but as you go forward with this effort, you should consider that NFSv4 migration allows individual file systems to be in grace. >From the point of view of the protocol--I think all that means is that a client should be prepared to handle GRACE errors at any time, and should treat them more or less the same as they would a DELAY error? > Yes. The eventual goal is eliminate the grace period on failovers once > the cluster fs is up and running, and out of its initial grace period. > > In order to do that, we'll need to push grace period handling into the > VFS layer to some degree, probably by providing a standard set of grace > period handling ops and allowing the filesystems to override them in > some fashion (maybe a new set of export ops?). That's what I've always imagined we'd do. Long-term it would be nice if even local filesystems could respect the grace period: local applications really shouldn't be grabbing new locks then either, and currently the only way to prevent that is to delay starting them until a grace period has passed. --b. > In any case, design of that is a later phase of this project once I get > this part settled... > > > > In order to achieve that goal, we need to first allow the client name > > > reclaim to be cluster aware as well. This patchset is a move toward that > > > goal and covers the initial kernel part of such a change. A patchset to > > > add a daemon to handle the upcalls will follow. > > > > > > Note that this patchset is still a little rough, so consider this an > > > RFC for the overall design. We'll also need to consider a plan to > > > deprecate the old client tracking code. > > > > > > The goal with this patchset is to replace the existing functionality, > > > without disturbing the existing code too much. There's some room for > > > more cleanup and reorganization once the old tracker is gone. > > > > > > Jeff Layton (5): > > > nfsd: add nfsd4_client_tracking_ops struct and a way to set it > > > sunrpc: create nfsd dir in rpc_pipefs > > > nfsd: add a header describing upcall for clname tracking daemon > > > nfsd: add a cl_daddr field and a generic flags field to nfs4_client > > > nfsd: add the infrastructure to handle the clstate upcall > > > > > > fs/nfsd/nfs4recover.c | 442 +++++++++++++++++++++++++++++++++++++++++- > > > fs/nfsd/nfs4state.c | 49 ++--- > > > fs/nfsd/state.h | 16 +- > > > include/linux/nfsd/clstate.h | 59 ++++++ > > > net/sunrpc/rpc_pipe.c | 5 + > > > 5 files changed, 526 insertions(+), 45 deletions(-) > > > create mode 100644 include/linux/nfsd/clstate.h > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > Jeff Layton <jlayton@xxxxxxxxxx> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html