Hi, Any idea how to troubleshoot this? New folders and files were created on the master and the replication went faulty. They were created via Samba.
Version: GlusterFS 4.1.3 [root@master]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE
STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount@slave_1::glustervol1 N/A Faulty N/A N/A master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount@slave_2::glustervol1 N/A Faulty N/A N/A master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount@interimmaster::glustervol1 N/A Faulty N/A N/A The following error is repeatedly logged in the gsyncd.logs: [2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick [2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net [2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... [2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665 [2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146 [2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick [2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016 [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl [2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141,
0) etime=1537540016 [2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0) [2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1
from_changelog=1537522638 [2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0
duration=0.0000 UNL=1 [2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284
DATA="" XATT=0 [2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141,
0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog [2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0) [2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops
error=OSError [2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop g3.crawlwrap(_oneshot_=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__ raise res OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I’m not sure if the syncdaemon runs under geoaccount!? [root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize
Operation Manager [root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ total 4 drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 [root@master vRealize Operation Manager]# It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed to completely wipe GlusterFS and geo-repliction
to get rid of this error… Any help is appreciated. Regards, Christian Kotte |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users