Hi Felix, It seems I missed your reply with the change log that Shwetha requested. Best Regards, Strahil Nikolov На 3 юли 2020 г. 11:16:30 GMT+03:00, "Felix Kölzow" <felix.koelzow@xxxxxx> написа: >Dear Users, >the geo-replication is still broken. This is not really a comfortable >situation. >Does any user has had the same experience and is able to share a >possible workaround? >We are actually running gluster v6.0 >Regards, > >Felix > > >On 25/06/2020 10:04, Shwetha Acharya wrote: >> Hi Rob and Felix, >> >> Please share the *-changes.log files and brick logs, which will help >> in analysis of the issue. >> >> Regards, >> Shwetha >> >> On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow <felix.koelzow@xxxxxx >> <mailto:felix.koelzow@xxxxxx>> wrote: >> >> Hey Rob, >> >> >> same issue for our third volume. Have a look at the logs just >from >> right now (below). >> >> Question: You removed the htime files and the old changelogs. >Just >> rm the files or is there something to pay more attention >> >> before removing the changelog files and the htime file. >> >> Regards, >> >> Felix >> >> [2020-06-25 07:51:53.795430] I [resource(worker >> /gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH: >> SSH connection between master and slave established. >> duration=1.2341 >> [2020-06-25 07:51:53.795639] I [resource(worker >> /gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER: >> Mounting gluster volume locally... >> [2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor] >> Monitor: worker died in startup phase >> brick=/gluster/vg01/dispersed_fuse1024/brick >> [2020-06-25 07:51:54.535809] I >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: >Worker >> Status Change status=Faulty >> [2020-06-25 07:51:54.882143] I [resource(worker >> /gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER: >> Mounted gluster volume duration=1.0864 >> [2020-06-25 07:51:54.882388] I [subcmds(worker >> /gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] <top>: >> Worker spawn successful. Acknowledging back to monitor >> [2020-06-25 07:51:56.911412] E [repce(agent >> /gluster/vg00/dispersed_fuse1024/brick):121:worker] <top>: call >> failed: >> Traceback (most recent call last): >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >> 117, in worker >> res = getattr(self.obj, rmeth)(*in_data[2:]) >> File >> "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", >line >> 40, in register >> return Changes.cl_register(cl_brick, cl_dir, cl_log, >cl_level, >> retries) >> File >> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", >line >> 46, in cl_register >> cls.raise_changelog_err() >> File >> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", >line >> 30, in raise_changelog_err >> raise ChangelogException(errn, os.strerror(errn)) >> ChangelogException: [Errno 2] No such file or directory >> [2020-06-25 07:51:56.912056] E [repce(worker >> /gluster/vg00/dispersed_fuse1024/brick):213:__call__] >RepceClient: >> call failed call=75086:140098349655872:1593071514.91 >> method=register error=ChangelogException >> [2020-06-25 07:51:56.912396] E [resource(worker >> /gluster/vg00/dispersed_fuse1024/brick):1286:service_loop] >> GLUSTER: Changelog register failed error=[Errno 2] No such >file >> or directory >> [2020-06-25 07:51:56.928031] I [repce(agent >> /gluster/vg00/dispersed_fuse1024/brick):96:service_loop] >> RepceServer: terminating on reaching EOF. >> [2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor] >> Monitor: worker died in startup phase >> brick=/gluster/vg00/dispersed_fuse1024/brick >> [2020-06-25 07:51:57.895920] I >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: >Worker >> Status Change status=Faulty >> [2020-06-25 07:51:58.607405] I [gsyncdstatus(worker >> /gluster/vg00/dispersed_fuse1024/brick):287:set_passive] >> GeorepStatus: Worker Status Change status=Passive >> [2020-06-25 07:51:58.607768] I [gsyncdstatus(worker >> /gluster/vg01/dispersed_fuse1024/brick):287:set_passive] >> GeorepStatus: Worker Status Change status=Passive >> [2020-06-25 07:51:58.608004] I [gsyncdstatus(worker >> /gluster/vg00/dispersed_fuse1024/brick):281:set_active] >> GeorepStatus: Worker Status Change status=Active >> >> >> On 25/06/2020 09:15, Rob.Quagliozzi@xxxxxxxxxxxx >> <mailto:Rob.Quagliozzi@xxxxxxxxxxxx> wrote: >>> >>> Hi All, >>> >>> We’ve got two six node RHEL 7.8 clusters and geo-replication >>> would appear to be completely broken between them. I’ve deleted >>> the session, removed & recreated pem files, old changlogs/htime >>> (after removing relevant options from volume) and completely set >>> up geo-rep from scratch, but the new session comes up as >>> Initializing, then goes faulty, and starts looping. Volume (on >>> both sides) is a 4 x 2 disperse, running Gluster v6 (RH >latest). >>> Gsyncd reports: >>> >>> [2020-06-25 07:07:14.701423] I >>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: >>> Worker Status Change status=Initializing... >>> >>> [2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor] >>> Monitor: starting gsyncd worker brick=/rhgs/brick20/brick >>> slave_node=bxts470194.eu.rabonet.com >>> <http://bxts470194.eu.rabonet.com> >>> >>> [2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor] >>> Monitor: Worker would mount volume privately >>> >>> [2020-06-25 07:07:14.757181] I [gsyncd(agent >>> /rhgs/brick20/brick):318:main] <top>: Using session config file >>> >path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf >>> >>> [2020-06-25 07:07:14.758126] D [subcmds(agent >>> /rhgs/brick20/brick):107:subcmd_agent] <top>: RPC FD >>> rpc_fd='5,12,11,10' >>> >>> [2020-06-25 07:07:14.758627] I [changelogagent(agent >>> /rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent >listining... >>> >>> [2020-06-25 07:07:14.764234] I [gsyncd(worker >>> /rhgs/brick20/brick):318:main] <top>: Using session config file >>> >path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf >>> >>> [2020-06-25 07:07:14.779409] I [resource(worker >>> /rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH >>> connection between master and slave... >>> >>> [2020-06-25 07:07:14.841793] D [repce(worker >>> /rhgs/brick20/brick):195:push] RepceClient: call >>> 6799:140380783982400:1593068834.84 __repce_version__() ... >>> >>> [2020-06-25 07:07:16.148725] D [repce(worker >>> /rhgs/brick20/brick):215:__call__] RepceClient: call >>> 6799:140380783982400:1593068834.84 __repce_version__ -> 1.0 >>> >>> [2020-06-25 07:07:16.148911] D [repce(worker >>> /rhgs/brick20/brick):195:push] RepceClient: call >>> 6799:140380783982400:1593068836.15 version() ... >>> >>> [2020-06-25 07:07:16.149574] D [repce(worker >>> /rhgs/brick20/brick):215:__call__] RepceClient: call >>> 6799:140380783982400:1593068836.15 version -> 1.0 >>> >>> [2020-06-25 07:07:16.149735] D [repce(worker >>> /rhgs/brick20/brick):195:push] RepceClient: call >>> 6799:140380783982400:1593068836.15 pid() ... >>> >>> [2020-06-25 07:07:16.150588] D [repce(worker >>> /rhgs/brick20/brick):215:__call__] RepceClient: call >>> 6799:140380783982400:1593068836.15 pid -> 30703 >>> >>> [2020-06-25 07:07:16.150747] I [resource(worker >>> /rhgs/brick20/brick):1435:connect_remote] SSH: SSH connection >>> between master and slave established. duration=1.3712 >>> >>> [2020-06-25 07:07:16.150819] I [resource(worker >>> /rhgs/brick20/brick):1105:connect] GLUSTER: Mounting gluster >>> volume locally... >>> >>> [2020-06-25 07:07:16.265860] D [resource(worker >>> /rhgs/brick20/brick):879:inhibit] DirectMounter: auxiliary >>> glusterfs mount in place >>> >>> [2020-06-25 07:07:17.272511] D [resource(worker >>> /rhgs/brick20/brick):953:inhibit] DirectMounter: auxiliary >>> glusterfs mount prepared >>> >>> [2020-06-25 07:07:17.272708] I [resource(worker >>> /rhgs/brick20/brick):1128:connect] GLUSTER: Mounted gluster >>> volume duration=1.1218 >>> >>> [2020-06-25 07:07:17.272794] I [subcmds(worker >>> /rhgs/brick20/brick):84:subcmd_worker] <top>: Worker spawn >>> successful. Acknowledging back to monitor >>> >>> [2020-06-25 07:07:17.272973] D [master(worker >>> /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up >>> change detection mode mode=xsync >>> >>> [2020-06-25 07:07:17.273063] D [monitor(monitor):273:monitor] >>> Monitor: worker(/rhgs/brick20/brick) connected >>> >>> [2020-06-25 07:07:17.273678] D [master(worker >>> /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up >>> change detection mode mode=changelog >>> >>> [2020-06-25 07:07:17.274224] D [master(worker >>> /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up >>> change detection mode mode=changeloghistory >>> >>> [2020-06-25 07:07:17.276484] D [repce(worker >>> /rhgs/brick20/brick):195:push] RepceClient: call >>> 6799:140380783982400:1593068837.28 version() ... >>> >>> [2020-06-25 07:07:17.276916] D [repce(worker >>> /rhgs/brick20/brick):215:__call__] RepceClient: call >>> 6799:140380783982400:1593068837.28 version -> 1.0 >>> >>> [2020-06-25 07:07:17.277009] D [master(worker >>> /rhgs/brick20/brick):777:setup_working_dir] _GMaster: changelog >>> working dir >>> >/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick >>> >>> [2020-06-25 07:07:17.277098] D [repce(worker >>> /rhgs/brick20/brick):195:push] RepceClient: call >>> 6799:140380783982400:1593068837.28 init() ... >>> >>> [2020-06-25 07:07:17.292944] D [repce(worker >>> /rhgs/brick20/brick):215:__call__] RepceClient: call >>> 6799:140380783982400:1593068837.28 init -> None >>> >>> [2020-06-25 07:07:17.293097] D [repce(worker >>> /rhgs/brick20/brick):195:push] RepceClient: call >>> 6799:140380783982400:1593068837.29 >>> register('/rhgs/brick20/brick', >>> >'/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick', >>> >'/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log', >>> 8, 5) ... >>> >>> [2020-06-25 07:07:19.296294] E [repce(agent >>> /rhgs/brick20/brick):121:worker] <top>: call failed: >>> >>> Traceback (most recent call last): >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>> 117, in worker >>> >>> res = getattr(self.obj, rmeth)(*in_data[2:]) >>> >>> File >>> "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", >>> line 40, in register >>> >>> return Changes.cl_register(cl_brick, cl_dir, cl_log, >>> cl_level, retries) >>> >>> File >>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", >>> line 46, in cl_register >>> >>> cls.raise_changelog_err() >>> >>> File >>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", >>> line 30, in raise_changelog_err >>> >>> raise ChangelogException(errn, os.strerror(errn)) >>> >>> ChangelogException: [Errno 2] No such file or directory >>> >>> [2020-06-25 07:07:19.297161] E [repce(worker >>> /rhgs/brick20/brick):213:__call__] RepceClient: call failed >>> call=6799:140380783982400:1593068837.29 method=register >>> error=ChangelogException >>> >>> [2020-06-25 07:07:19.297338] E [resource(worker >>> /rhgs/brick20/brick):1286:service_loop] GLUSTER: Changelog >>> register failed error=[Errno 2] No such file or directory >>> >>> [2020-06-25 07:07:19.315074] I [repce(agent >>> /rhgs/brick20/brick):96:service_loop] RepceServer: terminating >on >>> reaching EOF. >>> >>> [2020-06-25 07:07:20.275701] I [monitor(monitor):280:monitor] >>> Monitor: worker died in startup phase >brick=/rhgs/brick20/brick >>> >>> [2020-06-25 07:07:20.277383] I >>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: >>> Worker Status Change status=Faulty >>> >>> We’ve done everything we can think of, including an “strace –f” >>> on the pid, and we can’t really find anything. I’m about to lose >>> the last of my hair over this, so does anyone have any ideas at >>> all? We’ve even removed the entire slave vol and rebuilt it. >>> >>> Thanks >>> >>> Rob >>> >>> *Rob Quagliozzi* >>> >>> *Specialised Application Support* >>> >>> >>> >>> >------------------------------------------------------------------------ >>> This email (including any attachments to it) is confidential, >>> legally privileged, subject to copyright and is sent for the >>> personal attention of the intended recipient only. If you have >>> received this email in error, please advise us immediately and >>> delete it. You are notified that disclosing, copying, >>> distributing or taking any action in reliance on the contents of >>> this information is strictly prohibited. Although we have taken >>> reasonable precautions to ensure no viruses are present in this >>> email, we cannot accept responsibility for any loss or damage >>> arising from the viruses in this email or attachments. We >exclude >>> any liability for the content of this email, or for the >>> consequences of any actions taken on the basis of the >information >>> provided in this email or its attachments, unless that >>> information is subsequently confirmed in writing. <#rbnl#1898i> >>> >------------------------------------------------------------------------ >>> >>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge:https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users