Hi Dave, Two things. 1. I see the gluster has been upgraded from 3.4.2 to 3.5.3. Geo-rep has undergone design changes to make it distributed between these releases (https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md). Have you followed all the upgrade steps w.r.t geo-rep mentioned in the following link? http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5 2. Does the output of command 'gluster vol info <vol-name> --xml' proper ? Please paste the output. Thanks and Regards, Kotresh H R ----- Original Message ----- From: "David Gibbons" <david.c.gibbons@xxxxxxxxx> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> Cc: "gluster-users" <Gluster-users@xxxxxxxxxxx>, vnosov@xxxxxxxxxxxx Sent: Wednesday, December 10, 2014 6:12:00 PM Subject: Re: Geo-Replication Issue Symlinking gluster to /usr/bin/ seems to have resolved the path issue. Thanks for the tip there. Now there's a different error throw in the geo-rep/ssh...log: > [2014-12-10 07:32:42.609031] E >> [syncdutils(monitor):240:log_raise_exception] <top>: FAIL: > > Traceback (most recent call last): > > File "/usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py", line >> 150, in main > > main_i() > > File "/usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py", line >> 530, in main_i > > return monitor(*rscs) > > File "/usr/local/libexec/glusterfs/python/syncdaemon/monitor.py", line >> 243, in monitor > > return Monitor().multiplex(*distribute(*resources)) > > File "/usr/local/libexec/glusterfs/python/syncdaemon/monitor.py", line >> 205, in distribute > > mvol = Volinfo(master.volume, master.host) > > File "/usr/local/libexec/glusterfs/python/syncdaemon/monitor.py", line >> 22, in __init__ > > vi = XET.fromstring(vix) > > File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 963, in XML > > parser.feed(text) > > File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 1245, in feed > > self._parser.Parse(data, 0) > > ExpatError: syntax error: line 2, column 0 > > [2014-12-10 07:32:42.610858] I [syncdutils(monitor):192:finalize] <top>: >> exiting. > > I also get a bunch of these errors but have been assuming that they are being thrown because geo-replication hasn't started successfully yet. There is one for each brick: > [2014-12-10 12:33:33.539737] E >> [glusterd-geo-rep.c:2685:glusterd_gsync_read_frm_status] 0-: Unable to read >> gsyncd status file > > [2014-12-10 12:33:33.539742] E >> [glusterd-geo-rep.c:2999:glusterd_read_status_file] 0-: Unable to read the >> statusfile for /mnt/a-3-shares-brick-4/brick brick for shares(master), >> gfs-a-bkp::bkpshares(slave) session > > Do I have a config file error somewhere that I need to track down? This volume *was* upgraded from 3.4.2 a few weeks ago. Cheers, Dave On Wed, Dec 10, 2014 at 7:29 AM, David Gibbons <david.c.gibbons@xxxxxxxxx> wrote: > Hi Kotresh, > > Thanks for the tip. Unfortunately that does not seem to have any effect. > The path to the gluster binaries was already in $PATH. I did try adding the > path to the gsyncd binary, but same result. Contents of $PATH are: > >> >> /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/libexec/glusterfs/ >> > > It seems like perhaps one of the remote gsyncd processes cannot find the > gluster binary, because I see the following in the > geo-replication/shares/ssh...log. Can you point me toward how I can find > out what is throwing this log entry? > >> [2014-12-10 07:20:53.886676] E >>> [syncdutils(monitor):218:log_raise_exception] <top>: execution of "gluster" >>> failed with ENOENT (No such file or directory) >> >> [2014-12-10 07:20:53.886883] I [syncdutils(monitor):192:finalize] <top>: >>> exiting. >> >> > I think that whatever process is trying to use the gluster command has the > incorrect path to access it. Do you know how I could modify *that* path? > > I've manually tested the ssh_command and ssh_command_tar variables in the > relevant gsyncd.conf; both connect to the slave server successfully and > appear to execute the command they're supposed to. > > gluster_command_dir in gsyncd.conf is also the correct directory > (/usr/local/sbin). > > In summary: I think we're on to something with setting the path, but I > think I need to set it somewhere other than my shell. > > Thanks, > Dave > > > On Tue, Dec 9, 2014 at 11:52 PM, Kotresh Hiremath Ravishankar < > khiremat@xxxxxxxxxx> wrote: > >> If that is the case, as a workaround, try adding 'gluster' path >> to PATH environment variable or creating symlinks to gluster, >> glusterd binaries. >> >> 1. export PATH=$PATH:<path where gluster binaries are installed> >> >> Above should work, let me know if doesn't. >> >> Thanks and Regards, >> Kotresh H R >> >> ----- Original Message ----- >> From: "David Gibbons" <david.c.gibbons@xxxxxxxxx> >> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> >> Cc: "gluster-users" <Gluster-users@xxxxxxxxxxx>, vnosov@xxxxxxxxxxxx >> Sent: Tuesday, December 9, 2014 6:16:03 PM >> Subject: Re: Geo-Replication Issue >> >> Hi Kotresh, >> >> Yes, I believe that I am. Can you tell me which symlinks are missing/cause >> geo-replication to fail to start? I can create them manually. >> >> Thank you, >> Dave >> >> On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar < >> khiremat@xxxxxxxxxx> wrote: >> >> > Hi Dave, >> > >> > Are you hitting the below bug and so not able to sync symlinks ? >> > https://bugzilla.redhat.com/show_bug.cgi?id=1105283 >> > >> > Does geo-rep status say "Not Started" ? >> > >> > Thanks and Regards, >> > Kotresh H R >> > >> > ----- Original Message ----- >> > From: "David Gibbons" <david.c.gibbons@xxxxxxxxx> >> > To: "gluster-users" <Gluster-users@xxxxxxxxxxx> >> > Cc: vnosov@xxxxxxxxxxxx >> > Sent: Monday, December 8, 2014 7:03:31 PM >> > Subject: Re: Geo-Replication Issue >> > >> > Apologies for sending so many messages about this! I think I may be >> > running into this bug: >> > https://bugzilla.redhat.com/show_bug.cgi?id=1105283 >> > >> > Would someone be so kind as to let me know which symlinks are missing >> when >> > this bug manifests, so that I can create them? >> > >> > Thank you, >> > Dave >> > >> > >> > On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons < >> david.c.gibbons@xxxxxxxxx >> > > wrote: >> > >> > >> > >> > Ok, >> > >> > I was able to get geo-replication configured by changing >> > /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local >> > machine, instead of accessing bash -c directly. I then found that the >> hook >> > script was missing for geo-replication, so I copied that over manually. >> I >> > now have what appears to be a "configured" geo-rep setup: >> > >> > >> > >> > >> > # gluster volume geo-replication shares gfs-a-bkp::bkpshares status >> > >> > >> > >> > >> > MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL >> > STATUS >> > >> > >> > >> -------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> > gfs-a-3 shares /mnt/a-3-shares-brick-1/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-3 shares /mnt/a-3-shares-brick-2/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-3 shares /mnt/a-3-shares-brick-3/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-3 shares /mnt/a-3-shares-brick-4/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-2 shares /mnt/a-2-shares-brick-1/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-2 shares /mnt/a-2-shares-brick-2/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-2 shares /mnt/a-2-shares-brick-3/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-2 shares /mnt/a-2-shares-brick-4/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-4 shares /mnt/a-4-shares-brick-1/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-4 shares /mnt/a-4-shares-brick-2/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-4 shares /mnt/a-4-shares-brick-3/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-4 shares /mnt/a-4-shares-brick-4/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-1 shares /mnt/a-1-shares-brick-1/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-1 shares /mnt/a-1-shares-brick-2/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-1 shares /mnt/a-1-shares-brick-3/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > gfs-a-1 shares /mnt/a-1-shares-brick-4/brick gfs-a-bkp::bkpshares Not >> > Started N/A N/A >> > >> > So that's a step in the right direction (and I can upload a patch for >> > gverify to a bugzilla). However, gverify *should* have worked with >> bash-c, >> > and I was not able to figure out why it didn't work, other than it >> didn't >> > seem able to find some programs. I'm thinking that maybe the PATH >> variable >> > is wrong for Gluster, and that's why gverify didn't work out of the box. >> > >> > When I attempt to start geo-rep now, I get the following in the geo-rep >> > log: >> > >> > >> > [2014-12-07 10:52:40.893594] E >> > [syncdutils(monitor):218:log_raise_exception] <top>: execution of >> "gluster" >> > failed with ENOENT (No such file or directory) >> > >> > [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] <top>: >> > exiting. >> > >> > Which seems to agree that maybe gluster isn't running with the same path >> > variable that my console session is running with. Is this possible? I >> know >> > I'm grasping :). >> > >> > Any nudge in the right direction would be very much appreciated! >> > >> > Cheers, >> > Dave >> > >> > >> > On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons < >> david.c.gibbons@xxxxxxxxx >> > > wrote: >> > >> > >> > >> > Good Morning, >> > >> > I am having some trouble getting geo-replication started on a 3.5.3 >> volume. >> > >> > I have verified that password-less SSH is functional in both directions >> > from the backup gluster server, and all nodes in the production >> gluster. I >> > have verified that all nodes in production and backup cluster are >> running >> > the same version of gluster, and that name resolution works in both >> > directions. >> > >> > When I attempt to start geo-replication with this command: >> > >> > >> > gluster volume geo-replication shares gfs-a-bkp::bkpshares create >> push-pem >> > >> > I end up with the following in the logs: >> > >> > >> > [2014-12-06 15:02:50.284426] E >> > [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave >> > >> > [2014-12-06 15:02:50.284495] E >> > [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-: >> > gfs-a-bkp::bkpshares is not a valid slave volume. Error: Unable to fetch >> > master volume details. Please check the master cluster and master >> volume. >> > >> > [2014-12-06 15:02:50.284509] E [glusterd-syncop.c:912:gd_stage_op_phase] >> > 0-management: Staging of operation 'Volume Geo-replication Create' >> failed >> > on localhost : Unable to fetch master volume details. Please check the >> > master cluster and master volume. >> > >> > Would someone be so kind as to point me in the right direction? >> > >> > Cheers, >> > Dave >> > >> > >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx >> > http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > >> > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users