Hi All, We chose fcntl over sanlock as self healing of locks is being implemented. Patch is up for review. http://review.gluster.org/#/c/9759/ Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > To: gluster-devel@xxxxxxxxxxx > Sent: Monday, February 23, 2015 10:37:01 AM > Subject: Re: A new approach to solve Geo-replication ACTIVE/PASSIVE switching in distributed > replicate setup! > > Hi All, > > The logic discussed in previous mail thread is not feasible. So in order to > solve > the Active/Passive switching in geo-replication, following new idea is > thought off. > > 1. Have a shared storage, a glusterfs management volume specific to > geo-replication. > > 2. Use fcntl lock on a file stored on above said shared volume. There will be > one file > per replica set. > > Each worker tries to lock the file on shared storage, who ever wins will be > ACTIVE. > With this, we are able to solve the problem but there is an issue when the > shared > storage goes down (if it is replica, when all replicas goes down). In that > case, > the lock state is lost. > > But if we use sanlock, as ovirt uses, I think the above problem of lock state > being > lost could be solved ? > https://fedorahosted.org/sanlock/ > > If anybody have used sanlocks, is it a good option in this respect ? > Please share your thoughts, suggestions on this. > > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > To: gluster-devel@xxxxxxxxxxx > > Sent: Monday, December 22, 2014 10:53:34 AM > > Subject: A new approach to solve Geo-replication > > ACTIVE/PASSIVE switching in distributed replicate > > setup! > > > > Hi All, > > > > Current Desgin and its limitations: > > > > Geo-replication syncs changes across geography using changelogs > > captured > > by changelog translator. Changelog translator sits on server side just > > above posix > > translator. Hence, in distributed replicated setup, both replica pairs > > collect > > changelogs w.r.t their bricks. Geo-replication syncs the changes using > > only > > one > > brick among the replica pair at a time, calling it as "ACTIVE" and other > > non syncing > > brick as "PASSIVE". > > > > Let's consider below example of distributed replicated setup where > > NODE-1 as b1 and its replicated brick b1r is in NODE-2 > > > > NODE-1 NODE-2 > > b1 b1r > > > > At the beginning, geo-replication chooses to sync changes from NODE-1:b1 > > and > > NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr > > 'trusted.glusterfs.node-uuid' which always returns first up subvolume > > i.e., > > NODE-1. > > When NODE-1 goes down, the above xattr returns NODE-2 and that is made > > 'ACTIVE'. > > But when NODE-1 comes back again, the above xattr returns NODE-1 and it > > is > > made > > 'ACTIVE' again. So for a brief interval of time, if NODE-2 had not > > finished > > processing > > the changelog, both NODE-2 and NODE-1 will be ACTIVE causing rename race > > as below. > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1140183 > > > > > > SOLUTION: > > Don't make NODE-2 'PASSIVE' when NODE-1 comes back again untill NODE-2 > > goes down. > > > > > > APPROACH TO SOLVE WHICH I CAN THINK OF: > > > > Have a distributed store of a file, which captures the bricks which are > > active. > > When a NODE goes down, the file is updated with it's replica bricks making > > sure, at any point in time, the file has all the bricks to be made active. > > Geo-replication worker process is made 'ACTIVE' only if it is in the file. > > > > Implementation can be in two ways: > > > > 1. Have a distributed store for above implementation. This needs to be > > thought > > of as distributed store is not in place in glusterd yet. > > > > 2. Other solution is to store in a file similar to existing glusterd > > global > > configuration file (/var/lib/glusterd/options). When this file is > > updated, > > version number is incremented. When the node which is gone down, comes > > up, > > gets this file from peers if it's version number is less that of > > peers. > > > > > > I did a POC with second approach storing list of active bricks > > 'NodeUUID:brickpath' > > in options file itself. It seems to work fine except the bug in glusterd > > where the > > daemons are getting spawned before the node gets 'options' file from other > > node during > > handshake. > > > > CHANGES IN GLUSTERD: > > When a node goes down, all the other nodes are notified through > > glusterd_peer_rpc_notify, > > where, it needs to find the replicas of the node which went down and > > update > > the global > > file. > > > > PROBLEMS/LIMITATIONS WITH THIS APPRAOCH: > > 1. If glusterd is killed and the node is still up, this makes the other > > replica 'ACTIVE'. > > So both replica bricks will be syncing at this point of time which > > is > > not expected. > > > > 2. If the single brick process is killed, it's replica brick is not > > made > > 'ACTIVE'. > > > > > > Glusterd/AFR folks, > > > > 1. Do you see a better approach other than above to solve this issue? > > 2. Is this approach feasible? If yes, how can I handle the problems > > mentioned above ? > > 3. Is this approach feasible from scalability point of view since > > complete list of active > > brick path is stored and read by gsyncd ? > > 3. Does this approach fits into three way replication and erasure > > coding? > > > > > > > > Thanks and Regards, > > Kotresh H R > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel