Re: Upgrade to 4.1.1 geo-replication does not work

Marcus Pedersén <marcus.pedersen@xxxxxx> · Mon, 23 Jul 2018 12:55:32 +0000

Hi,
 #find /usr/ -name libglusterfs.so

Gives nothing.

#find /usr/ -name libglusterfs.so*
Gives:
/usr/lib64/libglusterfs.so.0                        /usr/lib64/libglusterfs.so.0.0.1

Thanks!
Marcus

################

Marcus Pedersén

Systemadministrator 

Interbull Centre

################

Sent from my phone 

################

Den 23 juli 2018 14:17 skrev Sunny Kumar <sunkumar@xxxxxxxxxx>:

Hi,

Can you confirm the location for libgfchangelog.so

by sharing output of following command -

# find /usr/ -name libglusterfs.so

- Sunny

On Mon, Jul 23, 2018 at 5:12 PM Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

>

> Hi Sunny,

> Here comes a part of gsyncd.log (The same info is repeated over and over again):

>

>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__

>     raise res

> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> [2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster

> [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000

> [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file       path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file      path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> [2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...

> [2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...

> [2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752

> [2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...

> [2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894

> [2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor

> [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:

> Traceback (most recent call last):

>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker

>     res = getattr(self.obj, rmeth)(*in_data[2:])

>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init

>     return Changes.cl_init()

>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__

>     from libgfchangelog import Changes as LChanges

>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>

>     class Changes(object):

>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes

>     use_errno=True)

>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__

>     self._handle = _dlopen(self._name, mode)

> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed   call=29589:140155686246208:1532345603.11        method=init     error=OSError

> [2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:

> Traceback (most recent call last):

>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main

>     func(args)

>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker

>     local.service_loop(remote)

>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop

>     changelog_agent.init()

>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__

>     return self.ins(self.meth, *a)

>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__

>     raise res

> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> [2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> [2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster

>

> Thanks, Sunny!!

>

> Regards

> Marcus Pedersén

>

> ________________________________________

> Från: Sunny Kumar <sunkumar@xxxxxxxxxx>

> Skickat: den 23 juli 2018 12:53

> Till: Marcus Pedersén

> Kopia: Kotresh Hiremath Ravishankar; gluster-users@xxxxxxxxxxx

> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

>

> Hi Marcus,

>

> On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

> >

> > Hi Sunny,

> > ldconfig -p /usr/local/lib | grep libgf

> > Output:

> >                 libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0                                                        libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0

> >         libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0                                                          libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0                                            libgfapi.so.0 (libc6,x86-64)
 => /lib64/libgfapi.so.0

> >

> > So that seems to be alright,  right?

> >

> Yes, this seems wright can you share the gsyncd.log again

> > Best regards

> > Marcus

> >

> > ################

> > Marcus Pedersén

> > Systemadministrator

> > Interbull Centre

> > ################

> > Sent from my phone

> > ################

> >

> > Den 23 juli 2018 11:17 skrev Sunny Kumar <sunkumar@xxxxxxxxxx>:

> >

> > Hi Marcus,

> >

> > On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

> > >

> > > Hi Kotresh,

> > >

> > > I ran:

> > >

> > > #ldconfig /usr/lib

> > can you do -

> > ldconfig /usr/local/lib

> >

> >

> > Output:

> >

> > >

> > > on all nodes in both clusters but I still get the same error.

> > >

> > > What to do?

> > >

> > >

> > > Output for:

> > >

> > > # ldconfig -p /usr/lib | grep libgf

> > >

> > >     libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0

> > >     libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0

> > >     libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0

> > >     libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0

> > >     libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0

> > >

> > >

> > > I read somewere that you could change some settings for geo-replication to speed up sync.

> > >

> > > I can not remember where I saw that and what config parameters.

> > >

> > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,

> > >

> > > and that will take a while before the slave nodes have catched up.

> > >

> > >

> > > Thanks and regards

> > >

> > > Marcus Pedersén

> > >

> > >

> > > Part of gsyncd.log:

> > >

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__

> > >     raise res

> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster

> > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000

> > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file       path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file      path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...

> > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...

> > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established.        duration=1.5373

> > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...

> > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853

> > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor

> > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:

> > > Traceback (most recent call last):

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker

> > >     res = getattr(self.obj, rmeth)(*in_data[2:])

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init

> > >     return Changes.cl_init()

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__

> > >     from libgfchangelog import Changes as LChanges

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>

> > >     class Changes(object):

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes

> > >     use_errno=True)

> > >   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__

> > >     self._handle = _dlopen(self._name, mode)

> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed   call=1146:139672481965888:1531909445.98 method=init     error=OSError

> > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:

> > > Traceback (most recent call last):

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main

> > >     func(args)

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker

> > >     local.service_loop(remote)

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop

> > >     changelog_agent.init()

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__

> > >     return self.ins(self.meth, *a)

> > >   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__

> > >     raise res

> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> > I think then you will not see this.

> > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster

> > >

> > >

> > > ________________________________

> > > Från: Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>

> > > Skickat: den 18 juli 2018 06:05

> > > Till: Marcus Pedersén

> > > Kopia: gluster-users@xxxxxxxxxxx

> > > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

> > >

> > > Hi Marcus,

> > >

> > > Well there is nothing wrong in setting up a symlink for gluster binary location, but

> > > there is a geo-rep command to set it so that gsyncd will search there.

> > >

> > > To set on master

> > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>

> > >

> > > To set on slave

> > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>

> > >

> > > Thanks,

> > > Kotresh HR

> > >

> > >

> > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> wrote:

> > >>

> > >> Hi Marcus,

> > >>

> > >> I am testing out 4.1 myself and I will have some update today.

> > >> For this particular traceback, gsyncd is not able to find the library.

> > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.

> > >> Please run the cmd below.

> > >>

> > >> #ldconfig /usr/lib

> > >> #ldconfig -p /usr/lib | grep libgf  (This should list libgfchangelog.so)

> > >>

> > >> Geo-rep should be fixed automatically.

> > >>

> > >> Thanks,

> > >> Kotresh HR

> > >>

> > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

> > >>>

> > >>> Hi again,

> > >>>

> > >>> I continue to do some testing, but now I have come to a stage where I need help.

> > >>>

> > >>>

> > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.

> > >>>

> > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.

> > >>>

> > >>> Both links were done on all slave nodes.

> > >>>

> > >>>

> > >>> Now I have a new error that I can not resolve myself.

> > >>>

> > >>> It can not open libgfchangelog.so

> > >>>

> > >>>

> > >>> Many thanks!

> > >>>

> > >>> Regards

> > >>>

> > >>> Marcus Pedersén

> > >>>

> > >>>

> > >>> Part of gsyncd.log:

> > >>>

> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster

> > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000

> > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file       path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file      path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...

> > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...

> > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established.        duration=1.6151

> > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...

> > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901

> > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor

> > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:

> > >>> Traceback (most recent call last):

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker

> > >>>     res = getattr(self.obj, rmeth)(*in_data[2:])

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init

> > >>>     return Changes.cl_init()

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__

> > >>>     from libgfchangelog import Changes as LChanges

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>

> > >>>     class Changes(object):

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes

> > >>>     use_errno=True)

> > >>>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__

> > >>>     self._handle = _dlopen(self._name, mode)

> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed   call=6078:139982918485824:1531855940.27 method=init     error=OSError

> > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:

> > >>> Traceback (most recent call last):

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main

> > >>>     func(args)

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker

> > >>>     local.service_loop(remote)

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop

> > >>>     changelog_agent.init()

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__

> > >>>     return self.ins(self.meth, *a)

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__

> > >>>     raise res

> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory

> > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster

> > >>>

> > >>>

> > >>>

> > >>> ________________________________

> > >>> Från: gluster-users-bounces@xxxxxxxxxxx <gluster-users-bounces@xxxxxxxxxxx> för Marcus Pedersén <marcus.pedersen@xxxxxx>

> > >>> Skickat: den 16 juli 2018 21:59

> > >>> Till: khiremat@xxxxxxxxxx

> > >>>

> > >>> Kopia: gluster-users@xxxxxxxxxxx

> > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

> > >>>

> > >>>

> > >>> Hi Kotresh,

> > >>>

> > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:

> > >>>

> > >>> /var/log/glusterfs/cli.log

> > >>>

> > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.

> > >>>

> > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.

> > >>>

> > >>>

> > >>> If I run: gluster-mountbroker status

> > >>>

> > >>> I get:

> > >>>

> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+

> > >>> |             NODE            | NODE STATUS |         MOUNT ROOT        |    GROUP     |          USERS           |

> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+

> > >>> | urd-gds-geo-001.hgen.slu.se |          UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume)  |

> > >>> |       urd-gds-geo-002       |          UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume)  |

> > >>> |          localhost          |          UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume)  |

> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+

> > >>>

> > >>>

> > >>> and that is all nodes on slave cluster, so mountbroker seems ok.

> > >>>

> > >>>

> > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.

> > >>>

> > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster

> > >>>

> > >>> Another error is that SSH between master and slave is broken,

> > >>>

> > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:

> > >>>

> > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 geouser@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume

> > >>>

> > >>> as geouser and that works, which means that the ssh connection works.

> > >>>

> > >>>

> > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?

> > >>>

> > >>> Is gluster supposed to be in /usr/local/sbin/gluster?

> > >>>

> > >>>

> > >>> Do I have any options or should I remove current geo-replication and create a new?

> > >>>

> > >>> How much do I need to clean up before creating a new geo-replication?

> > >>>

> > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?

> > >>>

> > >>>

> > >>> Many thanks in advance!

> > >>>

> > >>> Marcus Pedersén

> > >>>

> > >>>

> > >>> Part from the gsyncd.log:

> > >>>

> > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error    cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\

> > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock geouser@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser@urd-gds-geo-001::urd-gds-volu\

> > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\

> > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1

> > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)

> > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection        brick=/urd-gds/gluster

> > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker    brick=/urd-gds/gluster  slave_node=urd-gds-geo-000

> > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file       path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file        path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...

> > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...

> > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken

> > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\

> > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock geouser@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser@urd-gds-geo-001::urd-gds-volu\

> > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\

> > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1

> > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)

> > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.

> > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection       brick=/urd-gds/gluster

> > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000

> > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file      path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file       path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf

> > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...

> > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...

> > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken

> > >>>

> > >>> ________________________________

> > >>> Från: gluster-users-bounces@xxxxxxxxxxx <gluster-users-bounces@xxxxxxxxxxx> för Marcus Pedersén <marcus.pedersen@xxxxxx>

> > >>> Skickat: den 13 juli 2018 14:50

> > >>> Till: Kotresh Hiremath Ravishankar

> > >>> Kopia: gluster-users@xxxxxxxxxxx

> > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

> > >>>

> > >>> Hi Kotresh,

> > >>> Yes, all nodes have the same version 4.1.1 both master and slave.

> > >>> All glusterd are crashing on the master side.

> > >>> Will send logs tonight.

> > >>>

> > >>> Thanks,

> > >>> Marcus

> > >>>

> > >>> ################

> > >>> Marcus Pedersén

> > >>> Systemadministrator

> > >>> Interbull Centre

> > >>> ################

> > >>> Sent from my phone

> > >>> ################

> > >>>

> > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>:

> > >>>

> > >>> Hi Marcus,

> > >>>

> > >>> Is the gluster geo-rep version is same on both master and slave?

> > >>>

> > >>> Thanks,

> > >>> Kotresh HR

> > >>>

> > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

> > >>>

> > >>> Hi Kotresh,

> > >>>

> > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.

> > >>>

> > >>> I rebooted all servers but geo-replication status is still Stopped.

> > >>>

> > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.

> > >>>

> > >>> Nothing has been written to geo-replication logs since I sent the tail of the log.

> > >>>

> > >>> So I do not know what info to provide?

> > >>>

> > >>>

> > >>> Please, help me to find a way to solve this.

> > >>>

> > >>>

> > >>> Thanks!

> > >>>

> > >>>

> > >>> Regards

> > >>>

> > >>> Marcus

> > >>>

> > >>>

> > >>> ________________________________

> > >>> Från: gluster-users-bounces@xxxxxxxxxxx <gluster-users-bounces@xxxxxxxxxxx> för Marcus Pedersén <marcus.pedersen@xxxxxx>

> > >>> Skickat: den 12 juli 2018 08:51

> > >>> Till: Kotresh Hiremath Ravishankar

> > >>> Kopia: gluster-users@xxxxxxxxxxx

> > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

> > >>>

> > >>> Thanks Kotresh,

> > >>> I installed through the official centos channel, centos-release-gluster41.

> > >>> Isn't this fix included in centos install?

> > >>> I will have a look, test it tonight and come back to you!

> > >>>

> > >>> Thanks a lot!

> > >>>

> > >>> Regards

> > >>> Marcus

> > >>>

> > >>> ################

> > >>> Marcus Pedersén

> > >>> Systemadministrator

> > >>> Interbull Centre

> > >>> ################

> > >>> Sent from my phone

> > >>> ################

> > >>>

> > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>:

> > >>>

> > >>> Hi Marcus,

> > >>>

> > >>> I think the fix [1] is needed in 4.1

> > >>> Could you please this out and let us know if that works for you?

> > >>>

> > >>> [1] https://review.gluster.org/#/c/20207/

> > >>>

> > >>> Thanks,

> > >>> Kotresh HR

> > >>>

> > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

> > >>>

> > >>> Hi all,

> > >>>

> > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

> > >>>

> > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

> > >>>

> > >>> Both clusters works the way they should on their own.

> > >>>

> > >>> After upgrade on master side status for all geo-replication nodes is Stopped.

> > >>>

> > >>> I tried to start the geo-replication from master node and response back was started successfully.

> > >>>

> > >>> Status again .... Stopped

> > >>>

> > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

> > >>>

> > >>> After a restart of all glusterd the master cluster was up again.

> > >>>

> > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

> > >>>

> > >>>

> > >>> Please help me get the geo-replication up and running again.

> > >>>

> > >>>

> > >>> Best regards

> > >>>

> > >>> Marcus Pedersén

> > >>>

> > >>>

> > >>> Part of geo-replication log from master node:

> > >>>

> > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...

> > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...

> > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken

> > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error    cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\

> > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\

> > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume   error=2

> > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]

> > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>

> > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>                  {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\

> > >>> elete}

> > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>                  ...

> > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\

> > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')

> > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.

> > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.

> > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.

> > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection       brick=/urd-gds/gluster

> > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker   brick=/urd-gds/gluster  slave_node=ssh://geouser@urd-gds-geo-000:gluster://localhost:urd-gds-volume

> > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...

> > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...

> > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken

> > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error    cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\

> > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\

> > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume   error=2

> > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]

> > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>

> > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>                  {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\

> > >>> elete}

> > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>                  ...

> > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\

> > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')

> > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.

> > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.

> > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.

> > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection       brick=/urd-gds/gluster

> > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker    brick=/urd-gds/gluster  slave_node=ssh://geouser@urd-gds-geo-000:gluster://localhost:urd-gds-volume

> > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker     brick=/urd-gds/gluster

> > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection        brick=/urd-gds/gluster

> > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent

> > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:

> > >>> Traceback (most recent call last):

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap

> > >>>     except:

> > >>>   File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon

> > >>>     sys.exit()

> > >>> TypeError: 'int' object is not iterable

> > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

> > >>>

> > >>> ---

> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >>>

> > >>>

> > >>> _______________________________________________

> > >>> Gluster-users mailing list

> > >>> Gluster-users@xxxxxxxxxxx

> > >>> https://lists.gluster.org/mailman/listinfo/gluster-users

> > >>>

> > >>>

> > >>>

> > >>>

> > >>> --

> > >>> Thanks and Regards,

> > >>> Kotresh H R

> > >>>

> > >>>

> > >>> ---

> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >>>

> > >>> ---

> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >>>

> > >>>

> > >>>

> > >>>

> > >>> --

> > >>> Thanks and Regards,

> > >>> Kotresh H R

> > >>>

> > >>>

> > >>> ---

> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >>>

> > >>> ---

> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >>>

> > >>> ---

> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >>

> > >>

> > >>

> > >>

> > >> --

> > >> Thanks and Regards,

> > >> Kotresh H R

> > >

> > >

> > >

> > >

> > > --

> > > Thanks and Regards,

> > > Kotresh H R

> > >

> > > ---

> > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

> > >

> > > _______________________________________________

> > > Gluster-users mailing list

> > > Gluster-users@xxxxxxxxxxx

> > > https://lists.gluster.org/mailman/listinfo/gluster-users

> >

> >

> > ---

> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här

> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

>

> - Sunny

> ---

> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>

> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---

När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka
här 

E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click
here 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users