I've a big Problem.
If I start geo-replication everything seems fine, but after replicating
2.5TB I got errors, it's starting over an over again with the same errors.
I've two nodes with a replicated volume and a third arbiter node.
The destination node is a single node.
The firewall between all nodes ist open.
Master Log
[2018-10-25 07:08:59.619699] D
[master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering
./data/fa/files/backup/research/projects/2011-Regularity/2012-03-Gain-of-Regularity-linearWFP
[2018-10-25 07:08:59.619874] E
[syncdutils(/gluster/owncloud/brick2):325:log_raise_exception] <top>:
glusterfs session went down error=ENOTCONN
[2018-10-25 07:08:59.620109] E
[syncdutils(/gluster/owncloud/brick2):331:log_raise_exception] <top>:
FULL EXCEPTION TRACE:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210,
in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 801,
in main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1679, in service_loop
g1.crawlwrap(oneshot=True, register_time=register_time)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597,
in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555,
in crawl
self.process([item[1]], 0)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204,
in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1143,
in process_change
st = lstat(go[0])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
553, in lstat
return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
535, in errno_wrap
return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected:
'.gfid/5c143d64-165f-44b1-98ed-71e491376a76'
[2018-10-25 07:08:59.627846] D
[master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering
./data/fa/files/backup/research/projects/2011-Regularity/resources
[2018-10-25 07:08:59.632826] D
[master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering
./data/fa/files/backup/research/projects/2011-Regularity/add material
[2018-10-25 07:08:59.633582] D
[master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering
./data/fa/files/backup/research/projects/2011-Regularity/add material/Maple
[2018-10-25 07:08:59.636306] D
[master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering
./data/fa/files/backup/research/projects/2011-Regularity/add material/notes
[2018-10-25 07:08:59.637303] I
[syncdutils(/gluster/owncloud/brick2):271:finalize] <top>: exiting.
[2018-10-25 07:08:59.640778] I
[repce(/gluster/owncloud/brick2):92:service_loop] RepceServer:
terminating on reaching EOF.
[2018-10-25 07:08:59.641222] I
[syncdutils(/gluster/owncloud/brick2):271:finalize] <top>: exiting.
[2018-10-25 07:09:00.314140] I [monitor(monitor):363:monitor] Monitor:
worker died in startup phase brick=/gluster/owncloud/brick2
[2018-10-25 07:09:00.315172] I
[gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
Slave Log
[2018-10-25 07:08:44.206372] I [resource(slave):1502:connect] GLUSTER:
Mounting gluster volume locally...
[2018-10-25 07:08:45.229620] I [resource(slave):1515:connect] GLUSTER:
Mounted gluster volume duration=1.0229
[2018-10-25 07:08:45.230180] I [resource(slave):1012:service_loop]
GLUSTER: slave listening
[2018-10-25 07:08:59.641242] I [repce(slave):92:service_loop]
RepceServer: terminating on reaching EOF.
[2018-10-25 07:08:59.655611] I [syncdutils(slave):271:finalize] <top>:
exiting.
Volume Info
Volume Name: datacloud
Type: Replicate
Volume ID: 6cc79599-7a5c-4b02-bd86-13020a9d91db
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 172.17.45.11:/gluster/datacloud/brick2
Brick2: 172.17.45.12:/gluster/datacloud/brick2
Brick3: 172.17.45.13:/gluster/datacloud/brick2 (arbiter)
Options Reconfigured:
cluster.server-quorum-type: server
cluster.shd-max-threads: 32
cluster.self-heal-readdir-size: 64KB
cluster.quorum-type: fixed
transport.address-family: inet
diagnostics.brick-log-level: INFO
changelog.capture-del-path: on
storage.build-pgfid: on
changelog.changelog: on
geo-replication.ignore-pid-check: on
server.statedump-path: /tmp/gluster
cluster.self-heal-window-size: 32
geo-replication.indexing: on
nfs.trusted-sync: off
diagnostics.dump-fd-stats: off
nfs.disable: on
cluster.self-heal-daemon: enable
cluster.background-self-heal-count: 16
cluster.heal-timeout: 120
cluster.data-self-heal-algorithm: full
cluster.consistent-metadata: on
network.ping-timeout: 20
cluster.granular-entry-heal: enable
cluster.server-quorum-ratio: 51%
cluster.enable-shared-storage: enable
Best regards,
Michael
--
Michael Roth | michael.roth@xxxxxxxxxxxx
IT Solutions - Application Management
Technische Universität Wien - Operngasse 11, 1040 Wien
T +43-1-58801-42091
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users