1. For every fd which is opened in oldgraph when graph switch has happened, fuse does getxattr (oldfd, LOCKINFO, dict) 2. posix-locks fills appropriate values for LOCKINFO xattr. Each posix-locks translator in the volume fills dict with following key/value pairs key: combination of its hostname and brick-name value: If locks are held on this fd, fd_no field of the lock structure. Currently fd_no is (uint64_t)fdptr and hence it is guaranteed to be unique across all the connections (In future if this changes, we need to add connection identifier to the value too.) cluster translators send getxattr and setxattr calls with LOCKINFO as key to the same children to which setlk would've been sent. If getxattr is sent to more than one children, results are aggregated in cluster translators. 3. fuse does a setxattr (newfd, LOCKINFO, dict). dict is the result of getxattr and newfd is opened in new graph. 4. a. posix-locks looks into dict with <hostname, brick-name> combination as key b. if there is a value, the value is treated as oldfd_no. For all the locks opened on oldfd_no, it changes the following fields of lock structure: i) lock->fd_no = fd_to_fdno (newfd) ii) lock->trans = connection identifier of the connection on which setxattr came regards, Raghavendra. ----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Anand Avati" <aavati@xxxxxxxxxx> > Cc: gluster-devel@xxxxxxxxxx > Sent: Thursday, June 21, 2012 1:07:11 AM > Subject: RFC on posix locks migration to new graph after a switch > > Avati, > > We had relied on posix lock-healing (here after locks refer to posix > locks) done by protocol/client for lock migration to new graph. Lock > healing is a feature implemented by protocol/client which simply > reacquires all the granted locks stored in fd context after a > reconnect to server. The way we leverage this lock healing feature > of protocol client to migrate posix locks to new graph is: we > migrate fds to new graph by opening a new fd on the same file in new > graph (with fd context copied from old graph) and protocol/client > reacquires all the granted locks in fd context. But, this solution > has following issues: > > 1.If we open fds in new graph even before cleaning up of the > old-transport, lock requests sent by protocol/client as part of > healing will conflict with locks held on old-tranport and hence will > fail (Note that with only client-side graph switch there is a single > inode on server corresponding to two inodes - one corresponding to > each of old and new graphs - on client). As a result locks are not > migrated. The problem could've been solved if protocol/client had > issued SETLKW requests instead of SETLK (the lock requests issued as > part of healing would be granted when old-transport disconnects > eventually). But, that has different set of issues. Even then, this > is not a fool-proof solution, since there might already be other > conflicting lock requests in the lock wait queue when > protocol/client starts lock healing resulting in failure of > lock-heal. > > 2. If we open fds in new graph after cleaning of old-transport, there > is a window of time b/w old-tranport cleanup and lock-heal in new > graph where potentially conflicting lock requests could be granted, > there by causing lock requests sent as part of lock healing to fail. > > One solution I can think of is to bring in a SETLK_MIGRATE lock > command. SETLK_MIGRATE takes a transport identifier as a parameter > along with usual arguments SETLK/SETLKW take (like lock range, > lock-owner etc). SETLK_MIGRATE command migrates a lock from the > transport passed as a parameter to the transport on which request > came in, if two locks conflict only because they came from two > different transports (all else - lock-range, lock-owner etc - being > same). On absence of any live locks, SETLK_MIGRATE behaves similar > to SETLK command. > > protocol/client can make use of this SETLK_MIGRATE command in lock > requests it sends as part of lock heal during open fop to migrate > locks to new graph. Assuming that old-transport is not cleaned up at > the time of lock-heal, SETLK_MIGRATE atomically migrates locks from > old-transport to new-transport (on server). Now, the difficulty is > in getting the identifier to old-transport on server from which > locks are currently held. This can be solved if we store the peer > transport identifier in lk-context on client (which can be easily > obtained in an lk reply). We can pass the same transport identifier > to server during healing. > > I haven't yet completely thought of some issues like whether > protocol/client can unconditionally use SETLK_MIGRATE in all lock > requests it sends as part of healing or it should use SETLK_MIGRATE > only during first attempt of healing after a graph-switch. However > even if protocol/client wants to make such distinction, it can be > easily worked out (either by fuse setting a special "migrate" key in > xdata of open calls it sends as part of fd-migration or some > different mechanism). > > Please let me know your thoughts on this. > > regards, > Raghavendra. > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > https://lists.nongnu.org/mailman/listinfo/gluster-devel >