Re: Geo_replication to Faulty

Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> · Tue, 19 Nov 2019 11:21:44 +0530

Which version of gluster are you using?

On Tue, Nov 19, 2019 at 11:00 AM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
Hi kotreshIs there a stable release in 6.x series?

On Tue, Nov 19, 2019, 10:44 AM Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> wrote:
This issue has been recently fixed with the following patch and should be available in latest gluster-6.x
https://review.gluster.org/#/c/glusterfs/+/23570/

On Tue, Nov 19, 2019 at 10:26 AM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:

Hi Aravinda
The below logs are from master end:
[2019-11-16 17:29:43.536881] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker Status Change       status=Active
[2019-11-16 17:29:43.629620] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change   status=History Crawl
[2019-11-16 17:29:43.630328] I [master(worker /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)     etime=1573925383
[2019-11-16 17:29:44.636725] I [master(worker /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time     stime=(1573924576, 0)
[2019-11-16 17:29:44.778966] I [master(worker /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] _GMaster: Fixing ENOENT error in slave. Parent does not exist on master. Safe to ignore, take out entry       retry_count=1   entry=({'uid': 0, 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188, 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2019-11-16 17:29:44.779306] I [master(worker /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch    retry_count=1
[2019-11-16 17:29:44.779516] I [master(worker /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry original entries. count = 1
[2019-11-16 17:29:44.879321] E [repce(worker /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed  call=151945:140353273153344:1573925384.78       method=entry_ops        error=OSError
[2019-11-16 17:29:44.879750] E [syncdutils(worker /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in main
    func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in subcmd_worker
    local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in service_loop
    g3.crawlwrap(_oneshot_=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in __call__
    raise res
OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
[2019-11-16 17:29:44.911767] I [repce(agent /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on reaching EOF.
[2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase     brick=/home/sas/gluster/data/code-misc6
[2019-11-16 17:29:45.511806] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty

The below logs are from the slave end.
[2019-11-16 17:24:42.281599] I [resource(slave 192.168.185.106/home/sas/gluster/data/code-misc6):580:entry_ops] <top>: Special case: rename on mkdir    gfid=6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb       entry='.gfid/a8921d78-a078-46d3-aca5-8b078eb62cac/8878061b-d5b3-47a6-b01c-8310fee39b20'
[2019-11-16 17:24:42.370582] E [repce(slave 192.168.185.106/home/sas/gluster/data/code-misc6):122:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 581, in entry_ops
    src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 690, in get_slv_dir_path
    [ENOENT], [ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap
    return call(*arg)
OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
[2019-11-16 17:24:42.400402] I [repce(slave 192.168.185.106/home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on reaching EOF.
[2019-11-16 17:24:53.403165] W [gsyncd(slave 192.168.185.106/home/sas/gluster/data/code-misc6):304:main] <top>: Session config file not exists, using the default config        path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.con

On Sat, Nov 16, 2019, 9:26 PM Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx> wrote:
Hi Deepu,

Please share the reason for Faulty from Geo-rep logs of respective 
master node.

On Sat, Nov 16, 2019 at 1:01 AM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
Hi Users/Development TeamWe have set up a Geo-replication session with non-root in slave setup in our DC.
It was working well with Active Status and Changelogcrawl.

We were mounting the master node and the file is being written in it.
We were running some process as the root user so the process wrote some file and folder with root permission.
After stopping the geo-replication and starting the process the session went to the faulty state.
How to recover?

-- 
regards
Aravinda VK

-- 
Thanks and Regards,
Kotresh H R

-- 
Thanks and Regards,
Kotresh H R

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users