Dear Matthew, can you provide more information regarding to the geo-replication brick logs. These files area also located in: /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/ Usually, these log files are more precise to figure out the root cause of the error. Additionally, it is also worth to look at the log-files on the slave side. Regards, Felix On 01/10/2020 23:08, Matthew Benstead wrote:
Hello, I'm looking for some help with a GeoReplication Error in my Gluster 7/CentOS 7 setup. Replication progress has basically stopped, and the status of the replication keeps switching. The gsyncd log has errors like "Operation not permitted", "incomplete sync", etc... help? I'm not sure how to proceed in troubleshooting this. The log is here, it basically just repeats - from: /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log [2020-10-01 20:52:15.291923] I [master(worker /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken duration=32.8466 num_files=1749 job=3 return_code=23 [2020-10-01 20:52:18.700062] I [master(worker /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken duration=43.1210 num_files=3167 job=6 return_code=23 [2020-10-01 20:52:23.383234] W [master(worker /data/storage_c/storage):1393:process] _GMaster: incomplete sync, retrying changelogs files=['XSYNC-CHANGELOG.1601585397'] [2020-10-01 20:52:28.537657] E [repce(worker /data/storage_b/storage):213:__call__] RepceClient: call failed call=258187:140538843596608:1601585515.63 method=entry_ops error=OSError [2020-10-01 20:52:28.538064] E [syncdutils(worker /data/storage_b/storage):339:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 332, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1308, in service_loop g1.crawlwrap(oneshot=True, register_time=register_time) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1682, in crawl self.process([item[1]], 0) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1327, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1221, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 232, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 214, in __call__ raise res OSError: [Errno 1] Operation not permitted [2020-10-01 20:52:28.570316] I [repce(agent /data/storage_b/storage):96:service_loop] RepceServer: terminating on reaching EOF. [2020-10-01 20:52:28.613603] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2020-10-01 20:52:29.619797] I [master(worker /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken duration=5.6458 num_files=455 job=3 return_code=23 [2020-10-01 20:52:38.286245] I [master(worker /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken duration=14.1824 num_files=1333 job=2 return_code=23 [2020-10-01 20:52:38.628156] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker brick=/data/storage_b/storage slave_node=10.0.231.82 [2020-10-01 20:52:38.684736] I [gsyncd(agent /data/storage_b/storage):318:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2020-10-01 20:52:38.687213] I [gsyncd(worker /data/storage_b/storage):318:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2020-10-01 20:52:38.687401] I [changelogagent(agent /data/storage_b/storage):72:__init__] ChangelogAgent: Agent listining... [2020-10-01 20:52:38.703295] I [resource(worker /data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH connection between master and slave... [2020-10-01 20:52:40.388372] I [resource(worker /data/storage_b/storage):1435:connect_remote] SSH: SSH connection between master and slave established. duration=1.6849 [2020-10-01 20:52:40.388582] I [resource(worker /data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster volume locally... [2020-10-01 20:52:41.501105] I [resource(worker /data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster volume duration=1.1123 [2020-10-01 20:52:41.501405] I [subcmds(worker /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2020-10-01 20:52:43.531146] I [master(worker /data/storage_b/storage):1640:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage [2020-10-01 20:52:43.533953] I [resource(worker /data/storage_b/storage):1291:service_loop] GLUSTER: Register time time=1601585563 [2020-10-01 20:52:43.547092] I [gsyncdstatus(worker /data/storage_b/storage):281:set_active] GeorepStatus: Worker Status Change status=Active [2020-10-01 20:52:43.561920] I [gsyncdstatus(worker /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl [2020-10-01 20:52:43.562184] I [master(worker /data/storage_b/storage):1554:crawl] _GMaster: starting history crawl turns=1 stime=None entry_stime=None etime=1601585563 [2020-10-01 20:52:43.562269] I [resource(worker /data/storage_b/storage):1307:service_loop] GLUSTER: No stime available, using xsync crawl [2020-10-01 20:52:43.569799] I [master(worker /data/storage_b/storage):1670:crawl] _GMaster: starting hybrid crawl stime=None [2020-10-01 20:52:43.573528] I [gsyncdstatus(worker /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=Hybrid Crawl [2020-10-01 20:52:44.370985] I [master(worker /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken duration=20.4307 num_files=2609 job=5 return_code=23 [2020-10-01 20:52:49.431854] W [master(worker /data/storage_c/storage):1393:process] _GMaster: incomplete sync, retrying changelogs files=['XSYNC-CHANGELOG.1601585397'] [2020-10-01 20:52:54.801500] I [master(worker /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken duration=72.7492 num_files=4227 job=2 return_code=23 [2020-10-01 20:52:56.766547] I [master(worker /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken duration=74.3569 num_files=4674 job=5 return_code=23 [2020-10-01 20:53:18.853333] I [master(worker /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken duration=28.7125 num_files=4397 job=3 return_code=23 [2020-10-01 20:53:21.224921] W [master(worker /data/storage_a/storage):1393:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1601044033', 'CHANGELOG.1601044048', 'CHANGELOG.1601044063', 'CHANGELOG.1601044078', 'CHANGELOG.1601044093', 'CHANGELOG.1601044108', 'CHANGELOG.1601044123'] [2020-10-01 20:53:22.134536] I [master(worker /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken duration=0.2159 num_files=3 job=3 return_code=23 [2020-10-01 20:53:25.615712] I [master(worker /data/storage_b/storage):1681:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563 [2020-10-01 20:53:25.634970] W [master(worker /data/storage_c/storage):1393:process] _GMaster: incomplete sync, retrying changelogs files=['XSYNC-CHANGELOG.1601585397'] GeoReplication status - see it change from Active to Faulty: [root@storage01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.0.231.91 storage /data/storage_a/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Changelog Crawl 2020-09-25 07:26:57 10.0.231.91 storage /data/storage_c/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active Hybrid Crawl N/A 10.0.231.91 storage /data/storage_b/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active Hybrid Crawl N/A 10.0.231.92 storage /data/storage_b/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active History Crawl 2020-09-23 01:56:05 10.0.231.92 storage /data/storage_a/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active Hybrid Crawl N/A 10.0.231.92 storage /data/storage_c/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Hybrid Crawl N/A 10.0.231.93 storage /data/storage_c/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Changelog Crawl 2020-09-25 06:55:57 10.0.231.93 storage /data/storage_b/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Hybrid Crawl N/A 10.0.231.93 storage /data/storage_a/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Hybrid Crawl N/A [root@storage01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.0.231.91 storage /data/storage_a/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Changelog Crawl 2020-09-25 07:26:57 10.0.231.91 storage /data/storage_c/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active Hybrid Crawl N/A 10.0.231.91 storage /data/storage_b/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.92 storage /data/storage_b/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active History Crawl 2020-09-23 01:58:05 10.0.231.92 storage /data/storage_a/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active Hybrid Crawl N/A 10.0.231.92 storage /data/storage_c/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.93 storage /data/storage_c/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Changelog Crawl 2020-09-25 06:58:56 10.0.231.93 storage /data/storage_b/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active Hybrid Crawl N/A 10.0.231.93 storage /data/storage_a/storage geoaccount ssh://geoaccount@10.0.231.81::pcic-backup N/A Faulty N/A N/A Cluster information: (Note - disabled performance.open-behind to work around https://github.com/gluster/glusterfs/issues/1440 ) [root@storage01 ~]# gluster --version | head -1; cat /etc/centos-release; uname -r glusterfs 7.7 CentOS Linux release 7.8.2003 (Core) 3.10.0-1127.10.1.el7.x86_64 [root@storage01 ~]# df -h /storage2/ Filesystem Size Used Avail Use% Mounted on 10.0.231.91:/storage 328T 228T 100T 70% /storage2 [root@storage01 ~]# gluster volume info Volume Name: storage Type: Distributed-Replicate Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: 10.0.231.91:/data/storage_a/storage Brick2: 10.0.231.92:/data/storage_b/storage Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) Brick4: 10.0.231.92:/data/storage_a/storage Brick5: 10.0.231.93:/data/storage_b/storage Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) Brick7: 10.0.231.93:/data/storage_a/storage Brick8: 10.0.231.91:/data/storage_b/storage Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on network.ping-timeout: 10 features.inode-quota: on features.quota: on nfs.disable: on features.quota-deem-statfs: on storage.fips-mode-rchecksum: on performance.readdir-ahead: on performance.parallel-readdir: on cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 performance.cache-size: 256MB performance.open-behind: off Thanks, -Matthew ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users