Hi Kotresh, Thanks for looking into this issue! I'm attaching log files from the slave node from /var/log/glusterfs/geo-replication-slaves/ [root@SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.gluster.log /home/vnosov/ [root@SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/slave.log /home/vnosov/ [root@SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/mbr/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log /home/vnosov/ Best regards, Viktor Nosov -----Original Message----- From: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] Sent: Tuesday, December 06, 2016 9:25 PM To: Viktor Nosov Cc: gluster-users@xxxxxxxxxxx Subject: Re: Geo-replication failed to delete from slave file partially written to master volume. Hi Viktor, Please share geo-replication-slave mount logs from slave nodes. Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Viktor Nosov" <vnosov@xxxxxxxxxxxx> > To: gluster-users@xxxxxxxxxxx > Cc: vnosov@xxxxxxxxxxxx > Sent: Tuesday, December 6, 2016 7:13:22 AM > Subject: Geo-replication failed to delete from slave file partially written to master volume. > > Hi, > > I hit problem while testing geo-replication. Anybody knows how to fix > it except deleting and recreating geo-replication? > > Geo-replication failed to delete from slave file partially written to > master volume. > > Have geo-replication between two nodes that are running glusterfs > 3.7.16 > > with master volume: > > [root@SC-182 log]# gluster volume info master-for-183-0003 > > Volume Name: master-for-183-0003 > Type: Distribute > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > server.allow-insecure: on > performance.quick-read: off > performance.stat-prefetch: off > nfs.disable: on > nfs.addr-namelookup: off > performance.readdir-ahead: on > cluster.enable-shared-storage: enable > snap-activate-on-create: enable > > and slave volume: > > [root@SC-183 log]# gluster volume info rem-volume-0001 > > Volume Name: rem-volume-0001 > Type: Distribute > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 > Options Reconfigured: > performance.readdir-ahead: on > nfs.addr-namelookup: off > nfs.disable: on > performance.stat-prefetch: off > performance.quick-read: off > server.allow-insecure: on > snap-activate-on-create: enable > > Master volume mounted on node: > > [root@SC-182 log]# mount > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type > fuse.glusterfs (rw,allow_other,max_read=131072) > > Let's fill up space on master volume: > > [root@SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 > [root@SC-182 log]# cp big.file > /samba/master-for-183-0003/cifs_share/dir3/ > [root@SC-182 log]# cp big.file > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > No space left on device > cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > No space left on device > > File " big.file.1" represent part of the original file: > [root@SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > -rwx------ 1 root root 22155264 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > Both new files are geo-replicated to the Slave volume successfully: > > [root@SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > [root@SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep@10.10.60.183::rem-volume-0001 status > detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ------ > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep@10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 > N/A N/A N/A > > Let's delete partially written file from the master mount: > > [root@SC-182 log]# rm > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > rm: remove regular file > `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y > > [root@SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > > Set checkpoint: > > 32643 12/05/2016 16:57:46.540390536 1480985866 command: > /usr/sbin/gluster volume geo-replication master-for-183-0003 > nasgorep@10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 > /usr/sbin/gluster volume geo-replication master-for-183-0003 > nasgorep@10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > Check geo-replication status: > > [root@SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep@10.10.60.183::rem-volume-0001 status > detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------- > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep@10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 > 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 > > But the partially written file "big.file.1" is still present on the > slave > volume: > > [root@SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > Gluster logs for geo-replication do not have any indication about > failure to delete the file: > > [root@SC-182 log]# view > /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnas > gorep% > 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > [2016-12-06 00:49:40.267956] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 17 crawls, 1 turns > [2016-12-06 00:49:52.348413] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985358, 0) > [2016-12-06 00:49:53.296811] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:53.901186] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:54.760957] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.384705] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.987873] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:56.848361] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:57.471925] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.76416] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.935801] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:59.560571] E > [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: > SYNC Error(Rsync): rsync: rsync_xal_set: > lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.gluste > rfs.84 > 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not > permitted > (1) > [2016-12-06 00:49:59.560972] E > [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] > _GMaster: changelogs CHANGELOG.1480985389 could not be processed > completely - moving on... > [2016-12-06 00:50:41.839792] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 18 crawls, 1 turns > [2016-12-06 00:51:42.203411] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:52:42.600800] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:53:42.983913] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:54:43.381218] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:55:43.749927] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:56:44.113914] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:44.494354] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: > checkpoint > 1480985866 set > [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: > exiting. > [2016-12-06 00:57:50.530714] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985388, 0) > [2016-12-06 00:58:44.802122] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 1 turns > [2016-12-06 00:59:45.181669] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > > Best regards, > > Viktor Nosov > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users >
Attachment:
84LPUG~L.LOG
Description: Binary data
Attachment:
slave.log
Description: Binary data
Attachment:
8RSDWG~T.LOG
Description: Binary data
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users