Hi Kotresh,
Thanks for your hint, adding the "--ignore-missing-args" option to rsync and restarting geo-replication worked but it only managed to sync approximately 1/3 of the data until it put the geo replication in status "Failed" this time. Now I have a different type of error as you can see below from the log extract on my geo replication slave node:
[2017-04-12 18:01:55.268923] I [MSGID: 109066] [dht-rename.c:1574:dht_rename] 0-myvol-private-geo-dht: renaming /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls.ocTransferId2118183895.part (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0) => /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0)
[2017-04-12 18:01:55.269842] W [fuse-bridge.c:1787:fuse_rename_cbk] 0-glusterfs-fuse: 4786: /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls.ocTransferId2118183895.part -> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls => -1 (Directory not empty)
[2017-04-12 18:01:55.314062] I [fuse-bridge.c:5016:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-PNSR8s
[2017-04-12 18:01:55.314311] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f97d3129064] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f97d438a725] -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x7f97d438a5a7] ) 0-: received signum (15), shutting down
[2017-04-12 18:01:55.314335] I [fuse-bridge.c:5720:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-PNSR8s'.
How can I fix now this issue and have geo-replication continue synchronising again?
Best regards,
M.
-------- Original Message --------Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")Local Time: April 11, 2017 9:18 AMUTC Time: April 11, 2017 7:18 AMFrom: khiremat@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>Gluster Users <gluster-users@xxxxxxxxxxx>Hi,Then please use set the following rsync config and let us know if it helps.gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config rsync-options "--ignore-missing-args"Thanks and Regards,Kotresh H R----- Original Message -----> From: "mabi" <mabi@xxxxxxxxxxxxx>> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>> Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>> Sent: Tuesday, April 11, 2017 2:15:54 AM> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")>> Hi Kotresh,>> I am using the official Debian 8 (jessie) package which has rsync version> 3.1.1.>> Regards,> M.>> -------- Original Message --------> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat> "(unreachable)")> Local Time: April 10, 2017 6:33 AM> UTC Time: April 10, 2017 4:33 AM> From: khiremat@xxxxxxxxxx> To: mabi <mabi@xxxxxxxxxxxxx>> Gluster Users <gluster-users@xxxxxxxxxxx>>> Hi Mabi,>> What's the rsync version being used?>> Thanks and Regards,> Kotresh H R>> ----- Original Message -----> > From: "mabi" <mabi@xxxxxxxxxxxxx>> > To: "Gluster Users" <gluster-users@xxxxxxxxxxx>> > Sent: Saturday, April 8, 2017 4:20:25 PM> > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat> > "(unreachable)")> >> > Hello,> >> > I am using distributed geo replication with two of my GlusterFS 3.7.20> > replicated volumes and just noticed that the geo replication for one volume> > is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried> > to stop and restart geo replication but still it stays stuck at that> > specific date and time under the DATA field of the geo replication "status> > detail" command I can see 3879 and that it has "Active" as STATUS but still> > nothing happens. I noticed that the rsync process is running but does not> > do> > anything, then I did a strace on the PID of rsync and saw the following:> >> > write(2, "rsync: link_stat \"(unreachable)/"..., 114> >> > It looks like rsync can't read or find a file and stays stuck on that. In> > the> > geo-replication log files of GlusterFS master I can't find any error> > messages just informational message. For example when I restart the geo> > replication I see the following log entries:> >> > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] <top>:> > slave> > bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}]> > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] <top>:> > worker specs: [('/data/private/brick', 'ssh:// root@gfs1geo.domain> > :gluster://localhost:private-geo', '1', False)]> > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor:> > ------------------------------------------------------------> > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor:> > starting gsyncd worker> > [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i]> > <top>: syncing: gluster://localhost:private -> ssh:// root@gfs1geo.domain> > :gluster://localhost:private-geo> > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__]> > ChangelogAgent: Agent listining...> > [2017-04-07 21:43:08.558648] I> > [master(/data/private/brick):83:gmaster_builder] <top>: setting up xsync> > change detection mode> > [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__]> > _GMaster: using 'rsync' as the sync engine> > [2017-04-07 21:43:08.560163] I> > [master(/data/private/brick):83:gmaster_builder] <top>: setting up> > changelog> > change detection mode> > [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__]> > _GMaster: using 'rsync' as the sync engine> > [2017-04-07 21:43:08.561105] I> > [master(/data/private/brick):83:gmaster_builder] <top>: setting up> > changeloghistory change detection mode> > [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__]> > _GMaster: using 'rsync' as the sync engine> > [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register]> > _GMaster: xsync temp directory:> > /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync> > [2017-04-07 21:43:11.354751] I> > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time:> > 1491601391> > [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap]> > _GMaster: primary master with volume id> > e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5> > ...> > [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap]> > _GMaster: crawl interval: 1 seconds> > [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl]> > _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0),> > etime:> > 1491601391> > [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl]> > _GMaster: slave's time: (1487885974, 0)> >> > Does anyone know how I can find out the root cause of this problem and make> > geo replication work again from the time point it got stuck?> >> > Many thanks in advance for your help.> >> > Best regards,> > Mabi> >> >> >> >> > _______________________________________________> > Gluster-users mailing list> > Gluster-users@xxxxxxxxxxx> > http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users