On Fri, Apr 09, 2010 at 04:25:09PM -0400, Chuck Lever wrote: > >But, for all the kernel work on these nfs/gfs/dlm hooks, there's a larger > >issue that no one is working on AFAIK: the mechanisms for recovering > >client locks on remaining gfs nodes when one gfs node fails. That would > >take a lot of work, and until it's done all the kernel apis will be a moot > >point since clustered nfs locks on gfs will be unusable. > > To support IPv6, I've studied and modified the NFSv2/v3 lock > recovery mechanisms quite a bit recently. What kernel APIs do you > think would be needed to manage cluster lock recovery? Just > something to release stale locks on a single node? I only have a general idea of what needs to be done; I think Wendy Cheng may have written a more detailed TODO list a few years ago. The main problem is that when a gfs node fails, the other gfs nodes purge all the posix locks that it held. In the case of nfs that's a problem, of course, because the plocks being purged didn't finally belong to that node/server but to the clients connected to it. The clients are still alive and either failing over to an alternate gfs/nfs server or waiting for the failed server to return. So, when a gfs/nfs node/server fails, the remaining gfs servers need to reclaim locks from the nfs clients that were connected to it, and insert these locks into the gfs/dlm posix lock table. That recovery of client locks needs to happen more or less during the grace period, after the purging of locks from the failed node and before any locks are granted. Basically, nfs lock recovery needs to be integrated with gfs/dlm lock recovery. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html