Re: 4.1.x geo-replication "changelogs could not be processed completely" issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Answer inline.

On Tue, Sep 11, 2018 at 4:19 PM, Kotte, Christian (Ext) <christian.kotte@xxxxxxxxxxxx> wrote:

Hi all,

 

I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup. The gsyncd.log on the master is fine, but I have some strange changelog warnings and errors on the interimmaster:

 

gsyncd.log

[2018-09-11 10:38:35.575464] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time  stime=(1536662250, 0)

[2018-09-11 10:38:37.126749] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4698 num_files=1     job=1   return_code=23

[2018-09-11 10:38:37.128668] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:39.353209] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4057 num_files=1     job=2   return_code=23

[2018-09-11 10:38:39.354737] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:41.501187] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4781 num_files=1     job=3   return_code=23

[2018-09-11 10:38:41.503048] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:43.575047] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4385 num_files=1     job=1   return_code=23

[2018-09-11 10:38:43.576597] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:45.838089] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4765 num_files=1     job=2   return_code=23

[2018-09-11 10:38:45.840205] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:47.969033] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4602 num_files=1     job=3   return_code=23

[2018-09-11 10:38:47.970118] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken        duration=1.4717 num_files=1     job=1   return_code=23

[2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']

[2018-09-11 10:38:52.317955] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4711 num_files=1     job=2   return_code=23

[2018-09-11 10:38:52.319642] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:54.448926] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4715 num_files=1     job=3   return_code=23

[2018-09-11 10:38:54.451127] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs        files=['CHANGELOG.1536662311']

[2018-09-11 10:38:56.538007] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken       duration=1.4759 num_files=1     job=1   return_code=23

[2018-09-11 10:38:56.538914] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536662311']

[2018-09-11 10:38:56.544816] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=0

[2018-09-11 10:38:56.545031] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken    SETA=0  SETX=0  meta_duration=0.0000    data_duration=1536662336.5450   DATA="" XATT=0

[2018-09-11 10:38:56.545356] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed     changelog_end=1536662311        entry_stime=None        changelog_start=1536662311      stime=(1536662310, 0)   duration=20.9674        num_changelogs=1        mode=live_changelog



There seems to be a bug, please raise a bug. For now as a work around add the following line at the end on all the master node's configuration with
any editor. After adding it on all master nodes, stop and start geo-rep.

rsync-options = --ignore-missing-args

configuration file: /var/lib/glusterd/geo-replication/<mastervol>_<slave_node>_<slave/vol>gsyncd.conf






 

I had those issues in the past with 4.1.2 as well. I could fix it only by deleting the geo-replication and the gluster volume and re-create everything.

 

If I delete the geo-replication and delete the changelogs directory or the CHANGELOG files, I get this error:

 

gsyncd.log

[2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker] <top>: call failed:

Traceback (most recent call last):

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker

    res = getattr(self.obj, rmeth)(*in_data[2:])

  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history

    num_parallel)

  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog

    cls.raise_changelog_err()

  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err

    raise ChangelogException(errn, os.strerror(errn))

ChangelogException: [Errno 61] No data available

 

Or

 

Traceback (most recent call last):

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker

    res = getattr(self.obj, rmeth)(*in_data[2:])

  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history

    num_parallel)

  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog

    cls.raise_changelog_err()

  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err

    raise ChangelogException(errn, os.strerror(errn))

ChangelogException: [Errno 2] No such file or directory


Please share the changelog log file to debug this. On the same node where you got this traceback, on same location, share the following log file
"changes-bricks-brick1-brick.log"

 

I read somewhere that if I delete the geo-replication with “reset-sync-time”, the changelogs are cleared, but this doesn’t happen.


changelogs are not cleared, but in the new geo-rep session, the old changelogs are not used for syncing. 

 

How can I reset the changelog without deleting all data?

 

I didn't understand clearly what is the requirement here. Could you elaborate?

Regards,

 

Christian Kotte


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux