Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.8099 num_files=57    job=3   return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0904 num_files=3     job=3   return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed   call=30069:139655665837888:1535696752.35        method=entry_ops        error=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
    func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
    local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
    g3.crawlwrap(_oneshot_=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
    raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase     brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file       path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file      path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established.        duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir    path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time     time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change        status=Active
[2018-08-31 06:26:16.892049] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change    status=History Crawl
[2018-08-31 06:26:16.892703] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl    turns=1 stime=(1525739167, 0)   entry_stime=(1525740143, 0)     etime=1535696776
[2018-08-31 06:26:17.914803] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time      stime=(1525739167, 0)
[2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.1063 num_files=17    job=3   return_code=23
[2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.3441 num_files=34    job=1   return_code=23
[2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0923 num_files=7     job=3   return_code=23
[2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.1302 num_files=12    job=1   return_code=23
[2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.5024 num_files=50    job=2   return_code=23

[2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=1.5233 num_files=5     job=3   return_code=23
[2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken    duration=0.1191 num_files=7     job=1   return_code=23
[2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken    duration=0.0731 num_files=4     job=1   return_code=23
[2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0779 num_files=15    job=2   return_code=23
[2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0738 num_files=9     job=3   return_code=23
[2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.2478 num_files=33    job=2   return_code=23
[2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0873 num_files=9     job=3   return_code=23
[2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0758 num_files=3     job=2   return_code=23
[2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.2662 num_files=34    job=1   return_code=23
[2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0997 num_files=30    job=3   return_code=23
[2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken      duration=0.0965 num_files=8     job=3   return_code=23
[2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0832 num_files=9     job=1   return_code=23
[2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.6180 num_files=86    job=2   return_code=23
[2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0706 num_files=2     job=3   return_code=0
[2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken    duration=0.0814 num_files=1     job=1   return_code=23
[2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0933 num_files=13    job=1   return_code=23
[2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.2190 num_files=95    job=2   return_code=23
[2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0716 num_files=11    job=2   return_code=23
[2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.1167 num_files=38    job=3   return_code=23
[2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0768 num_files=7     job=3   return_code=23
[2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0770 num_files=4     job=2   return_code=23
[2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0932 num_files=23    job=1   return_code=23
[2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0886 num_files=26    job=2   return_code=23
[2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0692 num_files=4     job=2   return_code=23
[2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken    duration=0.0788 num_files=9     job=2   return_code=23
[2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0847 num_files=16    job=1   return_code=23
[2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.1087 num_files=25    job=2   return_code=23
[2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.1122 num_files=40    job=2   return_code=23
[2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.3822 num_files=74    job=3   return_code=23
[2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0743 num_files=18    job=3   return_code=23
[2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken   duration=0.0748 num_files=18    job=3   return_code=23


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux