Re: Geo-Replication Issue

Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> · Thu, 11 Dec 2014 01:13:41 -0500 (EST)

Hi Dave,

Two things.

1. I see the gluster has been upgraded from 3.4.2 to 3.5.3. 
   Geo-rep has undergone design changes to make it distributed 
   between these releases (https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md).
   Have you followed all the upgrade steps w.r.t geo-rep 
   mentioned in the following link?

   http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5

2. Does the output of command 'gluster vol info <vol-name> --xml' proper ?
   Please paste the output.

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "David Gibbons" <david.c.gibbons@xxxxxxxxx>
To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
Cc: "gluster-users" <Gluster-users@xxxxxxxxxxx>, vnosov@xxxxxxxxxxxx
Sent: Wednesday, December 10, 2014 6:12:00 PM
Subject: Re:  Geo-Replication Issue

Symlinking gluster to /usr/bin/ seems to have resolved the path issue.
Thanks for the tip there.

Now there's a different error throw in the geo-rep/ssh...log:

> [2014-12-10 07:32:42.609031] E
>> [syncdutils(monitor):240:log_raise_exception] <top>: FAIL:
>
> Traceback (most recent call last):
>
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
>> 150, in main
>
>     main_i()
>
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
>> 530, in main_i
>
>     return monitor(*rscs)
>
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/monitor.py", line
>> 243, in monitor
>
>     return Monitor().multiplex(*distribute(*resources))
>
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/monitor.py", line
>> 205, in distribute
>
>     mvol = Volinfo(master.volume, master.host)
>
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/monitor.py", line
>> 22, in __init__
>
>     vi = XET.fromstring(vix)
>
>   File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 963, in XML
>
>     parser.feed(text)
>
>   File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 1245, in feed
>
>     self._parser.Parse(data, 0)
>
> ExpatError: syntax error: line 2, column 0
>
> [2014-12-10 07:32:42.610858] I [syncdutils(monitor):192:finalize] <top>:
>> exiting.
>
>
I also get a bunch of these errors but have been assuming that they are
being thrown because geo-replication hasn't started successfully yet. There
is one for each brick:

> [2014-12-10 12:33:33.539737] E
>> [glusterd-geo-rep.c:2685:glusterd_gsync_read_frm_status] 0-: Unable to read
>> gsyncd status file
>
> [2014-12-10 12:33:33.539742] E
>> [glusterd-geo-rep.c:2999:glusterd_read_status_file] 0-: Unable to read the
>> statusfile for /mnt/a-3-shares-brick-4/brick brick for  shares(master),
>> gfs-a-bkp::bkpshares(slave) session
>
>
Do I have a config file error somewhere that I need to track down? This
volume *was* upgraded from 3.4.2 a few weeks ago.

Cheers,
Dave

On Wed, Dec 10, 2014 at 7:29 AM, David Gibbons <david.c.gibbons@xxxxxxxxx>
wrote:

> Hi Kotresh,
>
> Thanks for the tip. Unfortunately that does not seem to have any effect.
> The path to the gluster binaries was already in $PATH. I did try adding the
> path to the gsyncd binary, but same result. Contents of $PATH are:
>
>>
>> /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/libexec/glusterfs/
>>
>
> It seems like perhaps one of the remote gsyncd processes cannot find the
> gluster binary, because I see the following in the
> geo-replication/shares/ssh...log. Can you point me toward how I can find
> out what is throwing this log entry?
>
>> [2014-12-10 07:20:53.886676] E
>>> [syncdutils(monitor):218:log_raise_exception] <top>: execution of "gluster"
>>> failed with ENOENT (No such file or directory)
>>
>> [2014-12-10 07:20:53.886883] I [syncdutils(monitor):192:finalize] <top>:
>>> exiting.
>>
>>
> I think that whatever process is trying to use the gluster command has the
> incorrect path to access it. Do you know how I could modify *that* path?
>
> I've manually tested the ssh_command and ssh_command_tar variables in the
> relevant gsyncd.conf; both connect to the slave server successfully and
> appear to execute the command they're supposed to.
>
> gluster_command_dir in gsyncd.conf is also the correct directory
> (/usr/local/sbin).
>
> In summary: I think we're on to something with setting the path, but I
> think I need to set it somewhere other than my shell.
>
> Thanks,
> Dave
>
>
> On Tue, Dec 9, 2014 at 11:52 PM, Kotresh Hiremath Ravishankar <
> khiremat@xxxxxxxxxx> wrote:
>
>> If that is the case, as a workaround, try adding 'gluster' path
>> to PATH environment variable or creating symlinks to gluster,
>> glusterd binaries.
>>
>> 1. export PATH=$PATH:<path where gluster binaries are installed>
>>
>> Above should work, let me know if doesn't.
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>> ----- Original Message -----
>> From: "David Gibbons" <david.c.gibbons@xxxxxxxxx>
>> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> Cc: "gluster-users" <Gluster-users@xxxxxxxxxxx>, vnosov@xxxxxxxxxxxx
>> Sent: Tuesday, December 9, 2014 6:16:03 PM
>> Subject: Re:  Geo-Replication Issue
>>
>> Hi Kotresh,
>>
>> Yes, I believe that I am. Can you tell me which symlinks are missing/cause
>> geo-replication to fail to start? I can create them manually.
>>
>> Thank you,
>> Dave
>>
>> On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar <
>> khiremat@xxxxxxxxxx> wrote:
>>
>> > Hi Dave,
>> >
>> > Are you hitting the below bug and so not able to sync symlinks ?
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1105283
>> >
>> > Does geo-rep status say "Not Started" ?
>> >
>> > Thanks and Regards,
>> > Kotresh H R
>> >
>> > ----- Original Message -----
>> > From: "David Gibbons" <david.c.gibbons@xxxxxxxxx>
>> > To: "gluster-users" <Gluster-users@xxxxxxxxxxx>
>> > Cc: vnosov@xxxxxxxxxxxx
>> > Sent: Monday, December 8, 2014 7:03:31 PM
>> > Subject: Re:  Geo-Replication Issue
>> >
>> > Apologies for sending so many messages about this! I think I may be
>> > running into this bug:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1105283
>> >
>> > Would someone be so kind as to let me know which symlinks are missing
>> when
>> > this bug manifests, so that I can create them?
>> >
>> > Thank you,
>> > Dave
>> >
>> >
>> > On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons <
>> david.c.gibbons@xxxxxxxxx
>> > > wrote:
>> >
>> >
>> >
>> > Ok,
>> >
>> > I was able to get geo-replication configured by changing
>> > /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local
>> > machine, instead of accessing bash -c directly. I then found that the
>> hook
>> > script was missing for geo-replication, so I copied that over manually.
>> I
>> > now have what appears to be a "configured" geo-rep setup:
>> >
>> >
>> >
>> >
>> > # gluster volume geo-replication shares gfs-a-bkp::bkpshares status
>> >
>> >
>> >
>> >
>> > MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL
>> > STATUS
>> >
>> >
>> >
>> --------------------------------------------------------------------------------------------------------------------------------------------------------
>> >
>> > gfs-a-3 shares /mnt/a-3-shares-brick-1/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-3 shares /mnt/a-3-shares-brick-2/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-3 shares /mnt/a-3-shares-brick-3/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-3 shares /mnt/a-3-shares-brick-4/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-2 shares /mnt/a-2-shares-brick-1/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-2 shares /mnt/a-2-shares-brick-2/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-2 shares /mnt/a-2-shares-brick-3/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-2 shares /mnt/a-2-shares-brick-4/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-4 shares /mnt/a-4-shares-brick-1/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-4 shares /mnt/a-4-shares-brick-2/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-4 shares /mnt/a-4-shares-brick-3/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-4 shares /mnt/a-4-shares-brick-4/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-1 shares /mnt/a-1-shares-brick-1/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-1 shares /mnt/a-1-shares-brick-2/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-1 shares /mnt/a-1-shares-brick-3/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > gfs-a-1 shares /mnt/a-1-shares-brick-4/brick gfs-a-bkp::bkpshares Not
>> > Started N/A N/A
>> >
>> > So that's a step in the right direction (and I can upload a patch for
>> > gverify to a bugzilla). However, gverify *should* have worked with
>> bash-c,
>> > and I was not able to figure out why it didn't work, other than it
>> didn't
>> > seem able to find some programs. I'm thinking that maybe the PATH
>> variable
>> > is wrong for Gluster, and that's why gverify didn't work out of the box.
>> >
>> > When I attempt to start geo-rep now, I get the following in the geo-rep
>> > log:
>> >
>> >
>> > [2014-12-07 10:52:40.893594] E
>> > [syncdutils(monitor):218:log_raise_exception] <top>: execution of
>> "gluster"
>> > failed with ENOENT (No such file or directory)
>> >
>> > [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] <top>:
>> > exiting.
>> >
>> > Which seems to agree that maybe gluster isn't running with the same path
>> > variable that my console session is running with. Is this possible? I
>> know
>> > I'm grasping :).
>> >
>> > Any nudge in the right direction would be very much appreciated!
>> >
>> > Cheers,
>> > Dave
>> >
>> >
>> > On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons <
>> david.c.gibbons@xxxxxxxxx
>> > > wrote:
>> >
>> >
>> >
>> > Good Morning,
>> >
>> > I am having some trouble getting geo-replication started on a 3.5.3
>> volume.
>> >
>> > I have verified that password-less SSH is functional in both directions
>> > from the backup gluster server, and all nodes in the production
>> gluster. I
>> > have verified that all nodes in production and backup cluster are
>> running
>> > the same version of gluster, and that name resolution works in both
>> > directions.
>> >
>> > When I attempt to start geo-replication with this command:
>> >
>> >
>> > gluster volume geo-replication shares gfs-a-bkp::bkpshares create
>> push-pem
>> >
>> > I end up with the following in the logs:
>> >
>> >
>> > [2014-12-06 15:02:50.284426] E
>> > [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave
>> >
>> > [2014-12-06 15:02:50.284495] E
>> > [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-:
>> > gfs-a-bkp::bkpshares is not a valid slave volume. Error: Unable to fetch
>> > master volume details. Please check the master cluster and master
>> volume.
>> >
>> > [2014-12-06 15:02:50.284509] E [glusterd-syncop.c:912:gd_stage_op_phase]
>> > 0-management: Staging of operation 'Volume Geo-replication Create'
>> failed
>> > on localhost : Unable to fetch master volume details. Please check the
>> > master cluster and master volume.
>> >
>> > Would someone be so kind as to point me in the right direction?
>> >
>> > Cheers,
>> > Dave
>> >
>> >
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users@xxxxxxxxxxx
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >
>>
>
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users