Re: [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'll raise a bug as soon as I get chance.

I'm trying to think back to what has changed though.  We started on 4.1.3 if I recall and geo-rep was working.  We've ended up on 4.1.6 when we noticed the issue after blowing away a slave environment, deleting existing sessions, and then starting the geo-rep from scratch.

It's very peculiar because the mountbroker seems to be happily mounting and geosync user can access that mount point and write to it etc, and the source directory structure of the mount point appears to be setup, but then something triggers the exception when it tries to read .glusterfs folder.

We can't swap to root user geo-rep due to security policies, so we're stuck with a broken replication.



On Wed, 20 Mar 2019 at 17:42, Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> wrote:
Hi Andy,

This is a issue with non-root geo-rep session and is not fixed yet. Could you please raise a bug for this issue?

Thanks,
Kotresh HR

On Wed, Mar 20, 2019 at 11:53 AM Andy Coates <andy.coates@xxxxxxxxx> wrote:
We're seeing the same permission denied errors when running as a non-root geosync user.

Does anyone know what the underlying issue is?

On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) <christian.kotte@xxxxxxxxxxxx> wrote:

I changed to replication to use the root user and re-created the replication with “create force”. Now the files and folders were replicated, and the permission denied, and New folder error disappeared, but old files are not deleted.

 

Looks like the history crawl is in some kind of a loop:

 

[root@master ~]# gluster volume geo-replication status

 

MASTER NODE                         MASTER VOL     MASTER BRICK            SLAVE USER    SLAVE                                                  SLAVE NODE                          STATUS    CRAWL STATUS    LAST_SYNCED

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

master                              glustervol1    /bricks/brick1/brick    root          ssh://slave_1::glustervol1                                  slave_1                             Active    Hybrid Crawl    N/A

master                              glustervol1    /bricks/brick1/brick    root          ssh://slave_2::glustervol1                                  slave_2                             Active    Hybrid Crawl    N/A

master                              glustervol1    /bricks/brick1/brick    root          ssh://slave_3::glustervol1                                  slave_3                             Active    Hybrid Crawl    N/A

 

tail -f /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log

  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 104, in cl_history_changelog

    raise ChangelogHistoryNotAvailable()

ChangelogHistoryNotAvailable

[2018-09-25 14:10:44.196011] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed      call=29945:139700517484352:1537884644.19  method=history      error=ChangelogHistoryNotAvailable

[2018-09-25 14:10:44.196405] I [resource(worker /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not available, using xsync

[2018-09-25 14:10:44.221385] I [master(worker /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl     stime=(0, 0)

[2018-09-25 14:10:44.223382] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change      status=Hybrid Crawl

[2018-09-25 14:10:46.225296] I [master(worker /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog      path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644

[2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] <top>: Using session config file  path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] <top>: Using session config file  path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 14:31:10.145035] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1212.5316      num_files=1474    job=2 return_code=11

[2018-09-25 14:31:10.152637] E [syncdutils(worker /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock slave_3:/proc/15919/cwd error=11

[2018-09-25 14:31:10.237371] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-25 14:31:10.430820] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change  status=Faulty

[2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker      brick=/bricks/brick1/brick    slave_node=slave_3

[2018-09-25 14:31:20.806518] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file     path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 14:31:20.816536] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...

[2018-09-25 14:31:20.821574] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file     path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 14:31:20.882128] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...

[2018-09-25 14:31:24.169857] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established.      duration=3.2873

[2018-09-25 14:31:24.170401] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-25 14:31:25.354633] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.1839

[2018-09-25 14:31:25.355073] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor

[2018-09-25 14:31:27.439034] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir      path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick

[2018-09-25 14:31:27.441847] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time  time=1537885887

[2018-09-25 14:31:27.465053] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change      status=Active

[2018-09-25 14:31:27.471021] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change      status=History Crawl

[2018-09-25 14:31:27.471484] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl    turns=1     stime=(0, 0)      entry_stime=None      etime=1537885887

[2018-09-25 14:31:27.472564] E [repce(agent /bricks/brick1/brick):105:worker] <top>: call failed:

Traceback (most recent call last):

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker

    res = getattr(self.obj, rmeth)(*in_data[2:])

  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history

    num_parallel)

  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 104, in cl_history_changelog

    raise ChangelogHistoryNotAvailable()

ChangelogHistoryNotAvailable

[2018-09-25 14:31:27.480632] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed      call=31250:140272364406592:1537885887.47  method=history      error=ChangelogHistoryNotAvailable

[2018-09-25 14:31:27.480958] I [resource(worker /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not available, using xsync

[2018-09-25 14:31:27.495117] I [master(worker /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl     stime=(0, 0)

[2018-09-25 14:31:27.502083] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change      status=Hybrid Crawl

[2018-09-25 14:31:29.505284] I [master(worker /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog      path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887

 

tail -f /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log

[2018-09-25 13:49:24.141303] I [repce(slave master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-25 13:49:36.602051] W [gsyncd(slave master/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config      path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 13:49:36.629415] I [resource(slave master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-25 13:49:37.701642] I [resource(slave master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.0718

[2018-09-25 13:49:37.704282] I [resource(slave master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

[2018-09-25 14:10:27.70952] I [repce(slave master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-25 14:10:39.632124] W [gsyncd(slave master/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config      path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 14:10:39.650958] I [resource(slave master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-25 14:10:40.729355] I [resource(slave master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.0781

[2018-09-25 14:10:40.730650] I [resource(slave master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

[2018-09-25 14:31:10.291064] I [repce(slave master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-25 14:31:22.802237] W [gsyncd(slave master/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config      path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-25 14:31:22.828418] I [resource(slave master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-25 14:31:23.910206] I [resource(slave master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.0813

[2018-09-25 14:31:23.913369] I [resource(slave master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

 

Any ideas how to resolve this without re-creating everything again? Can I reset the changelog history?

 

Regards,

Christian

 

From: <gluster-users-bounces@xxxxxxxxxxx> on behalf of "Kotte, Christian (Ext)" <christian.kotte@xxxxxxxxxxxx>
Date: Monday, 24. September 2018 at 17:20
To: Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>
Cc: Gluster Users <gluster-users@xxxxxxxxxxx>
Subject: Re: [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

 

I don’t configure the permissions of /bricks/brick1/brick/.glusterfs. I don’t even see it on the local GlusterFS mount.

 

Not sure why the permissions are configured with S and the AD group…

 

Regards,

Christian

 

From: <gluster-users-bounces@xxxxxxxxxxx> on behalf of "Kotte, Christian (Ext)" <christian.kotte@xxxxxxxxxxxx>
Date: Monday, 24. September 2018 at 17:10
To: Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>
Cc: Gluster Users <gluster-users@xxxxxxxxxxx>
Subject: Re: [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

 

Yeah right. I get permission denied.

 

[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e

ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: Permission denied

[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/

ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied

[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/

ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied

[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/

ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied

 

[root@slave ~]# ll /bricks/brick1/brick/.glusterfs/29

total 0

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2

drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6

drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb

[root@slave ~]#

 

However, the strange thing is that I could replicate new files and folders before. The replication is broken since the “New folder” was created.

 

These are the permissions on a dev/test system:

[root@slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/

total 3136

drwx------. 44 root root    4096 Aug 22 18:19 00

drwx------. 50 root root    4096 Sep 12 13:14 01

drwx------. 54 root root    4096 Sep 13 11:33 02

drwx------. 59 root root    4096 Aug 22 18:21 03

drwx------. 60 root root    4096 Sep 12 13:14 04

drwx------. 68 root root    4096 Aug 24 12:36 05

drwx------. 56 root root    4096 Aug 22 18:21 06

drwx------. 46 root root    4096 Aug 22 18:21 07

drwx------. 51 root root    4096 Aug 22 18:21 08

drwx------. 42 root root    4096 Aug 22 18:21 09

drwx------. 44 root root    4096 Sep 13 11:16 0a

 

I’ve configured an AD group, SGID bit, and ACLs via Ansible on the local mount point. Could this be an issue? Should I avoid configuring the permissions on .glusterfs and below?

 

# ll /mnt/glustervol1/

total 12

drwxrwsr-x+  4 AD+user AD+group 4096 Jul 13 07:46 Scripts

drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software

-rw-rw-r--+  1 root    AD+group    0 Aug  8 08:44 test

drwxr-xr-x+  6 AD+user AD+group 4096 Apr 18 10:58 tftp

 

glusterfs_volumes:

[…]

    permissions:

      mode: "02775"

      owner: root

      group: "AD+group"

      acl_permissions: rw

[…]

 

# root directory is owned by root.

# set permissions to 'g+s' to automatically set the group to "AD+group"

# permissions of individual files will be set by Samba during creation

- name: Configure volume directory permission 1/2

  tags: glusterfs

  file:

    path: /mnt/{{ item.volume }}

    state: directory

    mode: "{{ item.permissions.mode }}"

    owner: "{{ item.permissions.owner }}"

    group: "{{ item.permissions.group }}"

  with_items: "{{ glusterfs_volumes }}"

  loop_control:

    label: "{{ item.volume }}"

  when: item.permissions is defined

 

# ACL needs to be set to override default umask and grant "AD+group" write permissions

- name: Configure volume directory permission 2/2 (ACL)

  tags: glusterfs

  acl:

    path: /mnt/{{ item.volume }}

    default: yes

    entity: "{{ item.permissions.group }}"

    etype: group

    permissions: "{{ item.permissions.acl_permissions }}"

    state: present

  with_items: "{{ glusterfs_volumes }}"

  loop_control:

    label: "{{ item.volume }}"

  when: item.permissions is defined

 

Regards,

Christian

 

From: Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>
Date: Monday, 24. September 2018 at 16:20
To: "Kotte, Christian (Ext)" <christian.kotte@xxxxxxxxxxxx>
Cc: Gluster Users <gluster-users@xxxxxxxxxxx>
Subject: Re: [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

 

I think I am get what's happening. The geo-rep session is non-root.

Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e

from a geaccount user and see if you are getting "Permission Denied" errors?

 

Thanks,

Kotresh HR

 

On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <christian.kotte@xxxxxxxxxxxx> wrote:

Ok. It happens on all slave nodes (and on the interimmaster as well).

 

It’s like I assumed. These are the logs of one of the slaves:

 

gsyncd.log

[2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config     path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf

[2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume      duration=1.1941

[2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

[2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir        gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e       entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'

[2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed:

Traceback (most recent call last):

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker

    res = getattr(self.obj, rmeth)(*in_data[2:])

  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops

    src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)

  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path

    [ENOENT], [ESTALE])

  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap

    return call(*arg)

OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

[2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config    path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf

[2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume      duration=1.1854

[2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

 

The folder “New folder” will be created via Samba and it was renamed by my colleague right away after creation.

[root@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/

[root@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al

total 0

drwx--S---+  2 root AD+group 50 Sep 21 09:39 .

drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..

lrwxrwxrwx.  1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

 

Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didn’t change anything.

 

mnt-slave-bricks-brick1-brick.log

[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0

[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0

[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0)

[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0

[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]

[2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg

[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0

[2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'.

[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22

[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0

[2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory /

[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E

[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down

[2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

 

Regards,

Christian

 

From: Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>
Date: Saturday, 22. September 2018 at 06:52
To: "Kotte, Christian (Ext)" <christian.kotte@xxxxxxxxxxxx>
Cc: Gluster Users <gluster-users@xxxxxxxxxxx>
Subject: Re: [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

 

The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.

 

 

 

 

 

On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <christian.kotte@xxxxxxxxxxxx> wrote:

Hi,

 

Any idea how to troubleshoot this?

 

New folders and files were created on the master and the replication went faulty. They were created via Samba.

 

Version: GlusterFS 4.1.3

 

[root@master]# gluster volume geo-replication status

 

MASTER NODE                         MASTER VOL     MASTER BRICK            SLAVE USER    SLAVE                                                             SLAVE NODE    STATUS    CRAWL STATUS    LAST_SYNCED

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@slave_1::glustervol1       N/A           Faulty    N/A             N/A

master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@slave_2::glustervol1       N/A           Faulty    N/A             N/A

master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@interimmaster::glustervol1   N/A           Faulty    N/A             N/A

 

The following error is repeatedly logged in the gsyncd.logs:

[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.

[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase     brick=/bricks/brick1/brick

[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty

[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/bricks/brick1/brick      slave_node=slave_3

[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...

[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file  path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf

[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...

[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established.      duration=3.4665

[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...

[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.1146

[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor

[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir        path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick

[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016

[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change    status=Active

[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change        status=History Crawl

[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl        turns=1 stime=(1537522637, 0)   entry_stime=(1537537141, 0)     etime=1537540016

[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time  stime=(1537522637, 0)

[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops        to_changelog=1537522638 num_changelogs=1        from_changelog=1537522638

[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=1

[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken    SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.0284    DATA="" XATT=0

[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed     changelog_end=1537522638        entry_stime=(1537537141, 0)     changelog_start=1537522638      stime=(1537522637, 0)   duration=0.0333 num_changelogs=1        mode=history_changelog

[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time  stime=(1537522637, 0)

[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed       call=17209:140650361157440:1537540017.21        method=entry_ops        error=OSError

[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:

Traceback (most recent call last):

  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main

    func(args)

  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker

    local.service_loop(remote)

  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop

    g3.crawlwrap(_oneshot_=True)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap

    self.crawl()

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl

    self.changelogs_batch_process(changes)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process

    self.process(batch)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process

    self.process_change(change, done, retry)

  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change

    failures = self.slave.server.entry_ops(entries)

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__

    return self.ins(self.meth, *a)

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__

    raise res

OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

 

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I’m not sure if the syncdaemon runs under geoaccount!?

 

[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e

lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

 

[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/

total 4

drwxrwxr-x. 2 AD+user AD+group  131 Sep 21 10:14 6.7

drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0

drwxrwxr-x. 2 AD+user AD+group   57 Sep 21 10:28 7.5

[root@master vRealize Operation Manager]#

 

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this error…

 

Any help is appreciated.

 

Regards,

 

Christian Kotte

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



--

Thanks and Regards,

Kotresh H R

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Thanks and Regards,
Kotresh H R
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux