Re: Gluster7 GeoReplication Operation not permitted and incomplete sync

Matthew Benstead <matthewb@xxxxxxx> · Mon, 5 Oct 2020 13:28:19 -0700

    Hmm... Looks like I forgot to set the xattr's to sa - I left them as
    default. 

    [root@pcic-backup01 ~]# zfs get xattr pcic-backup01-zpool

    NAME                 PROPERTY  VALUE  SOURCE

    pcic-backup01-zpool  xattr     on     default

    [root@pcic-backup02 ~]# zfs get xattr pcic-backup02-zpool

    NAME                 PROPERTY  VALUE  SOURCE

    pcic-backup02-zpool  xattr     on     default

    I wonder if I can change them and continue, or if I need to blow
    away the zpool and start over? 

    Thanks,

     -Matthew

        --

          Matthew Benstead

          System Administrator

            Pacific Climate
              Impacts Consortium

            University of Victoria, UH1

            PO Box 1800, STN CSC

            Victoria, BC, V8W 2Y2

            Phone: +1-250-721-8432

            Email: matthewb@xxxxxxx

    On 10/5/20 12:53 PM, Felix Kölzow
      wrote:

      Dear Matthew,
      this is our configuration:
      zfs get all mypool
      mypool  xattr                          
          sa                              local

        mypool  acltype                        
          posixacl                        local

      Something more to consider?

      Regards,
      Felix

      On 05/10/2020 21:11, Matthew Benstead
        wrote:

       Thanks
        Felix - looking through some more of the logs I may have found
        the reason...

        From
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/mnt-data-storage_a-storage.log

        [2020-10-05 18:13:35.736838] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:18:53.885591] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:22:14.405234] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:25:53.971679] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:31:44.571557] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:36:36.508772] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:40:10.401055] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:42:57.833536] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:45:19.691953] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:48:26.478532] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        [2020-10-05 18:52:24.466914] E
          [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended
          attribute not supported by the backend storage

        The slave nodes are running gluster on top of ZFS, but I had
        configured ACLs - is there something else missing to make this
        work with ZFS? 

        [root@pcic-backup01 ~]# gluster volume info

          Volume Name: pcic-backup

          Type: Distribute

          Volume ID: 7af8a424-f4b6-4405-bba1-0dbafb0fa231

          Status: Started

          Snapshot Count: 0

          Number of Bricks: 2

          Transport-type: tcp

          Bricks:

          Brick1: 10.0.231.81:/pcic-backup01-zpool/brick

          Brick2: 10.0.231.82:/pcic-backup02-zpool/brick

          Options Reconfigured:

          network.ping-timeout: 10

          performance.cache-size: 256MB

          server.event-threads: 4

          client.event-threads: 4

          cluster.lookup-optimize: on

          performance.parallel-readdir: on

          performance.readdir-ahead: on

          features.quota-deem-statfs: on

          features.inode-quota: on

          features.quota: on

          transport.address-family: inet

          nfs.disable: on

          features.read-only: off

          performance.open-behind: off

          [root@pcic-backup01 ~]# zfs get acltype pcic-backup01-zpool

        NAME                 PROPERTY  VALUE     SOURCE

        pcic-backup01-zpool  acltype   posixacl  local

          [root@pcic-backup01 ~]# grep "pcic-backup0" /proc/mounts

          pcic-backup01-zpool /pcic-backup01-zpool zfs
          rw,seclabel,xattr,posixacl 0 0

        [root@pcic-backup02 ~]# zfs get acltype
          pcic-backup02-zpool

        NAME                 PROPERTY  VALUE     SOURCE

        pcic-backup02-zpool  acltype   posixacl  local

          [root@pcic-backup02 ~]# grep "pcic-backup0" /proc/mounts 

          pcic-backup02-zpool /pcic-backup02-zpool zfs
          rw,seclabel,xattr,posixacl 0 0

        Thanks,

         -Matthew

          --

            Matthew Benstead

            System Administrator

            Pacific
              Climate Impacts Consortium

            University of Victoria, UH1

            PO Box 1800, STN CSC

            Victoria, BC, V8W 2Y2

            Phone: +1-250-721-8432

            Email: matthewb@xxxxxxx

        On 10/5/20 1:39 AM, Felix Kölzow
          wrote:

        Dear
          Matthew, 

          can you provide more information regarding to the
          geo-replication brick 

          logs. 

          These files area also located in: 

/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/ 

          Usually, these log files are more precise to figure out the
          root cause 

          of the error. 

          Additionally, it is also worth to look at the log-files on the
          slave side. 

          Regards, 

          Felix 

          On 01/10/2020 23:08, Matthew Benstead wrote: 

          Hello, 

            I'm looking for some help with a GeoReplication Error in my
            Gluster 

            7/CentOS 7 setup. Replication progress has basically
            stopped, and the 

            status of the replication keeps switching. 

            The gsyncd log has errors like "Operation not permitted",
            "incomplete 

            sync", etc... help? I'm not sure how to proceed in
            troubleshooting this. 

            The log is here, it basically just repeats - from: 

/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log

            [2020-10-01 20:52:15.291923] I [master(worker 

            /data/storage_a/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=32.8466        num_files=1749  job=3  
            return_code=23 

            [2020-10-01 20:52:18.700062] I [master(worker 

            /data/storage_c/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=43.1210        num_files=3167  job=6  
            return_code=23 

            [2020-10-01 20:52:23.383234] W [master(worker 

            /data/storage_c/storage):1393:process] _GMaster: incomplete
            sync, 

            retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']

            [2020-10-01 20:52:28.537657] E [repce(worker 

            /data/storage_b/storage):213:__call__] RepceClient: call
            failed 

            call=258187:140538843596608:1601585515.63      
            method=entry_ops 

            error=OSError 

            [2020-10-01 20:52:28.538064] E [syncdutils(worker 

            /data/storage_b/storage):339:log_raise_exception]
            <top>: FAIL: 

            Traceback (most recent call last): 

               File
            "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
            332, 

            in main 

                 func(args) 

               File
            "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line
            86, 

            in subcmd_worker 

                 local.service_loop(remote) 

               File
            "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line

            1308, in service_loop 

                 g1.crawlwrap(_oneshot_=True, register_time=register_time)

               File
            "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
            602, 

            in crawlwrap 

                 self.crawl() 

               File
            "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
            1682, 

            in crawl 

                 self.process([item[1]], 0) 

               File
            "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
            1327, 

            in process 

                 self.process_change(change, done, retry) 

               File
            "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
            1221, 

            in process_change 

                 failures = self.slave.server.entry_ops(entries) 

               File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
            line 232, in 

            __call__ 

                 return self.ins(self.meth, *a) 

               File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
            line 214, in 

            __call__ 

                 raise res 

            OSError: [Errno 1] Operation not permitted 

            [2020-10-01 20:52:28.570316] I [repce(agent 

            /data/storage_b/storage):96:service_loop] RepceServer:
            terminating on 

            reaching EOF. 

            [2020-10-01 20:52:28.613603] I 

            [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
            Worker 

            Status Change status=Faulty 

            [2020-10-01 20:52:29.619797] I [master(worker 

            /data/storage_c/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=5.6458 num_files=455   job=3   return_code=23 

            [2020-10-01 20:52:38.286245] I [master(worker 

            /data/storage_c/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=14.1824        num_files=1333  job=2  
            return_code=23 

            [2020-10-01 20:52:38.628156] I 

            [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
            Worker 

            Status Change status=Initializing... 

            [2020-10-01 20:52:38.628325] I
            [monitor(monitor):159:monitor] Monitor: 

            starting gsyncd worker   brick=/data/storage_b/storage 

            slave_node=10.0.231.82 

            [2020-10-01 20:52:38.684736] I [gsyncd(agent 

            /data/storage_b/storage):318:main] <top>: Using
            session config 

            file 

path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf

            [2020-10-01 20:52:38.687213] I [gsyncd(worker 

            /data/storage_b/storage):318:main] <top>: Using
            session config 

            file 

path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf

            [2020-10-01 20:52:38.687401] I [changelogagent(agent 

            /data/storage_b/storage):72:__init__] ChangelogAgent: Agent
            listining... 

            [2020-10-01 20:52:38.703295] I [resource(worker 

            /data/storage_b/storage):1386:connect_remote] SSH:
            Initializing SSH 

            connection between master and slave... 

            [2020-10-01 20:52:40.388372] I [resource(worker 

            /data/storage_b/storage):1435:connect_remote] SSH: SSH
            connection 

            between master and slave established. duration=1.6849 

            [2020-10-01 20:52:40.388582] I [resource(worker 

            /data/storage_b/storage):1105:connect] GLUSTER: Mounting
            gluster volume 

            locally... 

            [2020-10-01 20:52:41.501105] I [resource(worker 

            /data/storage_b/storage):1128:connect] GLUSTER: Mounted
            gluster volume 

            duration=1.1123 

            [2020-10-01 20:52:41.501405] I [subcmds(worker 

            /data/storage_b/storage):84:subcmd_worker] <top>:
            Worker spawn 

            successful. Acknowledging back to monitor 

            [2020-10-01 20:52:43.531146] I [master(worker 

            /data/storage_b/storage):1640:register] _GMaster: Working
            dir 

path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage

            [2020-10-01 20:52:43.533953] I [resource(worker 

            /data/storage_b/storage):1291:service_loop] GLUSTER:
            Register time 

            time=1601585563 

            [2020-10-01 20:52:43.547092] I [gsyncdstatus(worker 

            /data/storage_b/storage):281:set_active] GeorepStatus:
            Worker Status 

            Change status=Active 

            [2020-10-01 20:52:43.561920] I [gsyncdstatus(worker 

            /data/storage_b/storage):253:set_worker_crawl_status]
            GeorepStatus: 

            Crawl Status Change     status=History Crawl 

            [2020-10-01 20:52:43.562184] I [master(worker 

            /data/storage_b/storage):1554:crawl] _GMaster: starting
            history 

            crawl     turns=1 stime=None      entry_stime=None       
            etime=1601585563 

            [2020-10-01 20:52:43.562269] I [resource(worker 

            /data/storage_b/storage):1307:service_loop] GLUSTER: No
            stime available, 

            using xsync crawl 

            [2020-10-01 20:52:43.569799] I [master(worker 

            /data/storage_b/storage):1670:crawl] _GMaster: starting
            hybrid 

            crawl      stime=None 

            [2020-10-01 20:52:43.573528] I [gsyncdstatus(worker 

            /data/storage_b/storage):253:set_worker_crawl_status]
            GeorepStatus: 

            Crawl Status Change     status=Hybrid Crawl 

            [2020-10-01 20:52:44.370985] I [master(worker 

            /data/storage_c/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=20.4307        num_files=2609  job=5  
            return_code=23 

            [2020-10-01 20:52:49.431854] W [master(worker 

            /data/storage_c/storage):1393:process] _GMaster: incomplete
            sync, 

            retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']

            [2020-10-01 20:52:54.801500] I [master(worker 

            /data/storage_a/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=72.7492        num_files=4227  job=2  
            return_code=23 

            [2020-10-01 20:52:56.766547] I [master(worker 

            /data/storage_a/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=74.3569        num_files=4674  job=5  
            return_code=23 

            [2020-10-01 20:53:18.853333] I [master(worker 

            /data/storage_c/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=28.7125        num_files=4397  job=3  
            return_code=23 

            [2020-10-01 20:53:21.224921] W [master(worker 

            /data/storage_a/storage):1393:process] _GMaster: incomplete
            sync, 

            retrying changelogs     files=['CHANGELOG.1601044033', 

            'CHANGELOG.1601044048', 'CHANGELOG.1601044063',
            'CHANGELOG.1601044078', 

            'CHANGELOG.1601044093', 'CHANGELOG.1601044108',
            'CHANGELOG.1601044123'] 

            [2020-10-01 20:53:22.134536] I [master(worker 

            /data/storage_a/storage):1991:syncjob] Syncer: Sync Time
            Taken 

            duration=0.2159 num_files=3     job=3   return_code=23 

            [2020-10-01 20:53:25.615712] I [master(worker 

            /data/storage_b/storage):1681:crawl] _GMaster: processing
            xsync 

            changelog 

path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563

            [2020-10-01 20:53:25.634970] W [master(worker 

            /data/storage_c/storage):1393:process] _GMaster: incomplete
            sync, 

            retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']

            GeoReplication status - see it change from Active to Faulty:

            [root@storage01 ~]# gluster volume geo-replication status 

            MASTER NODE    MASTER VOL    MASTER BRICK              
            SLAVE USER 

            SLAVE                                        SLAVE NODE    
            STATUS 

            CRAWL STATUS       LAST_SYNCED 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

            10.0.231.91    storage       /data/storage_a/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Changelog Crawl    2020-09-25 07:26:57 

            10.0.231.91    storage       /data/storage_c/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            Hybrid Crawl       N/A 

            10.0.231.91    storage       /data/storage_b/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            Hybrid Crawl       N/A 

            10.0.231.92    storage       /data/storage_b/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            History Crawl      2020-09-23 01:56:05 

            10.0.231.92    storage       /data/storage_a/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            Hybrid Crawl       N/A 

            10.0.231.92    storage       /data/storage_c/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Hybrid Crawl       N/A 

            10.0.231.93    storage       /data/storage_c/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Changelog Crawl    2020-09-25 06:55:57 

            10.0.231.93    storage       /data/storage_b/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Hybrid Crawl       N/A 

            10.0.231.93    storage       /data/storage_a/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Hybrid Crawl       N/A 

            [root@storage01 ~]# gluster volume geo-replication status 

            MASTER NODE    MASTER VOL    MASTER BRICK              
            SLAVE USER 

            SLAVE                                        SLAVE NODE    
            STATUS 

            CRAWL STATUS       LAST_SYNCED 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

            10.0.231.91    storage       /data/storage_a/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Changelog Crawl    2020-09-25 07:26:57 

            10.0.231.91    storage       /data/storage_c/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            Hybrid Crawl       N/A 

            10.0.231.91    storage       /data/storage_b/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            N/A            Faulty 

            N/A                N/A 

            10.0.231.92    storage       /data/storage_b/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            History Crawl      2020-09-23 01:58:05 

            10.0.231.92    storage       /data/storage_a/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.82    Active 

            Hybrid Crawl       N/A 

            10.0.231.92    storage       /data/storage_c/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            N/A            Faulty 

            N/A                N/A 

            10.0.231.93    storage       /data/storage_c/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Changelog Crawl    2020-09-25 06:58:56 

            10.0.231.93    storage       /data/storage_b/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            10.0.231.81    Active 

            Hybrid Crawl       N/A 

            10.0.231.93    storage       /data/storage_a/storage   
            geoaccount 

            ssh://geoaccount@10.0.231.81::pcic-backup   
            N/A            Faulty 

            N/A                N/A 

            Cluster information: (Note - disabled
            performance.open-behind to work 

            around https://github.com/gluster/glusterfs/issues/1440
            ) 

            [root@storage01 ~]# gluster --version | head -1; cat 

            /etc/centos-release; uname -r 

            glusterfs 7.7 

            CentOS Linux release 7.8.2003 (Core) 

            3.10.0-1127.10.1.el7.x86_64 

            [root@storage01 ~]# df -h /storage2/ 

            Filesystem            Size  Used Avail Use% Mounted on 

            10.0.231.91:/storage  328T  228T  100T  70% /storage2 

            [root@storage01 ~]# gluster volume info 

            Volume Name: storage 

            Type: Distributed-Replicate 

            Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 

            Status: Started 

            Snapshot Count: 0 

            Number of Bricks: 3 x (2 + 1) = 9 

            Transport-type: tcp 

            Bricks: 

            Brick1: 10.0.231.91:/data/storage_a/storage 

            Brick2: 10.0.231.92:/data/storage_b/storage 

            Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) 

            Brick4: 10.0.231.92:/data/storage_a/storage 

            Brick5: 10.0.231.93:/data/storage_b/storage 

            Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) 

            Brick7: 10.0.231.93:/data/storage_a/storage 

            Brick8: 10.0.231.91:/data/storage_b/storage 

            Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) 

            Options Reconfigured: 

            changelog.changelog: on 

            geo-replication.ignore-pid-check: on 

            geo-replication.indexing: on 

            network.ping-timeout: 10 

            features.inode-quota: on 

            features.quota: on 

            nfs.disable: on 

            features.quota-deem-statfs: on 

            storage.fips-mode-rchecksum: on 

            performance.readdir-ahead: on 

            performance.parallel-readdir: on 

            cluster.lookup-optimize: on 

            client.event-threads: 4 

            server.event-threads: 4 

            performance.cache-size: 256MB 

            performance.open-behind: off 

            Thanks, 

              -Matthew 

            ________ 

            Community Meeting Calendar: 

            Schedule - 

            Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC 

            Bridge: https://bluejeans.com/441850968

            Gluster-users mailing list 

            Gluster-users@xxxxxxxxxxx 

            https://lists.gluster.org/mailman/listinfo/gluster-users

          ________ 

          Community Meeting Calendar: 

          Schedule - 

          Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC 

          Bridge: https://bluejeans.com/441850968 

          Gluster-users mailing list 

          Gluster-users@xxxxxxxxxxx 

          https://lists.gluster.org/mailman/listinfo/gluster-users

        ________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

      ________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users