[geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

Any idea how to troubleshoot this?

 

New folders and files were created on the master and the replication went faulty. They were created via Samba.

 

Version: GlusterFS 4.1.3

 

[root@master]# gluster volume geo-replication status

 

MASTER NODE                         MASTER VOL     MASTER BRICK            SLAVE USER    SLAVE                                                             SLAVE NODE    STATUS    CRAWL STATUS    LAST_SYNCED

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@slave_1::glustervol1       N/A           Faulty    N/A             N/A

master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@slave_2::glustervol1       N/A           Faulty    N/A             N/A

master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@interimmaster::glustervol1   N/A           Faulty    N/A             N/A

 

The following error is repeatedly logged in the gsyncd.logs:

[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase     brick=/bricks/brick1/brick

[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty

[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/bricks/brick1/brick      slave_node=nrchbs-slp2020.nibr.novartis.net

[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf

[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...

[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file  path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf

[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...

[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established.      duration=3.4665

[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.1146

[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor

[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir        path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick

[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016

[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change    status=Active

[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change        status=History Crawl

[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl        turns=1 stime=(1537522637, 0)   entry_stime=(1537537141, 0)     etime=1537540016

[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time  stime=(1537522637, 0)

[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops        to_changelog=1537522638 num_changelogs=1        from_changelog=1537522638

[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=1

[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken    SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.0284    DATA="" XATT=0

[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed     changelog_end=1537522638        entry_stime=(1537537141, 0)     changelog_start=1537522638      stime=(1537522637, 0)   duration=0.0333 num_changelogs=1        mode=history_changelog

[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time  stime=(1537522637, 0)

[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed       call=17209:140650361157440:1537540017.21        method=entry_ops        error=OSError

[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:

Traceback (most recent call last):

  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main

    func(args)

  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker

    local.service_loop(remote)

  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop

    g3.crawlwrap(_oneshot_=True)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap

    self.crawl()

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl

    self.changelogs_batch_process(changes)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process

    self.process(batch)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process

    self.process_change(change, done, retry)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change

    failures = self.slave.server.entry_ops(entries)

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__

    return self.ins(self.meth, *a)

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__

    raise res

OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

 

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I’m not sure if the syncdaemon runs under geoaccount!?

 

[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e

lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

 

[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/

total 4

drwxrwxr-x. 2 AD+user AD+group  131 Sep 21 10:14 6.7

drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0

drwxrwxr-x. 2 AD+user AD+group   57 Sep 21 10:28 7.5

[root@master vRealize Operation Manager]#

 

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this error…

 

Any help is appreciated.

 

Regards,

 

Christian Kotte

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux