Hi Viktor, I was caught up with something else and apologies for the late reply. I went through the logs and here is the reason for geo-rep failing to sync in ENOSPC scenario. The changelog xlator records the operations on the server side. Geo-replication replays the same to sync to the slave. The changelogs are present in /<brick_path>/.glusterfs/changelogs/. The file named 'CHANGELOG' gets created waiting for I/O to record it. By default every 15 sec, if the 'CHANGELOG' file has some operation recorded, it will be renamed to 'CHANGELOG.<TSTAMP>'. And a new 'CHANGELOG' file is created. In this usecase, the new 'CHANGELOG' file creation is failed because of ENOSPC and hence the delete operation is not recorded in changelog. Because of which geo-rep didn't sync it. Later after 15 sec, it could create 'CHANGELOG' file because space was reclaimed. After which everything is syncing. So geo-rep might miss syncing few files in ENOSPC scenario. Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Viktor Nosov" <vnosov@xxxxxxxxxxxx> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > Cc: gluster-users@xxxxxxxxxxx > Sent: Saturday, December 10, 2016 2:43:35 AM > Subject: Geo-replication failed to delete from slave file partially written to master volume. > > Hi Kotresh, > > Logs from geo-replication test that I used to report the problem were > overwritten. > So I run the test again. Each time the problem is reproduced with 100% > probability. > This time I took all glusterfs logs from Master and Slave systems. Logs are > attached to this message. > > All steps to duplicate the problem are reported at the text file > "geo-rep-test.txt". This file is attached too. > Records at the file "geo-rep-test.txt" are in chronological order. > Names of volumes, bricks, nodes are the same as before. > > The test starts when geo-replication synchronization is working properly. > After that attempt is made to write file "big.file.1" to the master that has > not enough space to handle the whole file. > Linux writes the file partially and geo-replication syncs this partial file > "big.file.1" with the slave volume. > > The next write of new file "test.file" to the master fails but it creates > file handler "test.file" on Master for file size zero. This new file > "test.file" is not replicated to the slave > volume. > > The next step is to delete partially written file "big.file.1" from the > master to make free space on the master. The delete is successful, > but it never is sync to the slave. This is the problem. File "big.file.1" is > still on the slave volume. > > The next step is to repeat write for file "test.file". We let system to > overwrite file on the master volume. > Now file "test.file" has some contents. But this change did not sync to the > slave. This is other flavor of the same problem. > > Finally new file "test.file.1" was written to the master. The file was > successfully replicated to the slave. > > Best regards, > > Viktor Nosov > > > > -----Original Message----- > From: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > Sent: Friday, December 09, 2016 3:03 AM > To: Viktor Nosov > Cc: gluster-users@xxxxxxxxxxx > Subject: Re: Geo-replication failed to delete from slave file > partially written to master volume. > > Hi Viktor, > > I went through the slave logs and there are no errors w.r.t deletion of the > files. > I suspect the changelog has missed recording deletion of file because of > which the delete is not processed. > > Please share the following logs from master volume to root cause the issue. > > 1. Master geo-replication logs: > /var/log/glusterfs/geo-replication/<mastervolname>/*.log > 2. Master brick logs: /var/log/glusterfs/bricks/*.log 3. Also changelogs from > Master volume: > /exports/nas-segment-0012/master-for-183-0003/.glusterfs/changelogs/* > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "Viktor Nosov" <vnosov@xxxxxxxxxxxx> > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > Cc: gluster-users@xxxxxxxxxxx > > Sent: Wednesday, December 7, 2016 10:48:52 PM > > Subject: RE: Geo-replication failed to delete from slave > > file partially written to master volume. > > > > Hi Kotresh, > > > > Thanks for looking into this issue! > > I'm attaching log files from the slave node from > > /var/log/glusterfs/geo-replication-slaves/ > > > > [root@SC-183 log]# cp > > /var/log/glusterfs/geo-replication-slaves/84501a83-b07c-4768-bfaa-418b > > 038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.gluster.log > > /home/vnosov/ > > [root@SC-183 log]# cp > > /var/log/glusterfs/geo-replication-slaves/slave.log > > /home/vnosov/ > > [root@SC-183 log]# cp > > /var/log/glusterfs/geo-replication-slaves/mbr/84501a83-b07c-4768-bfaa- > > 418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > /home/vnosov/ > > > > Best regards, > > > > Viktor Nosov > > > > > > -----Original Message----- > > From: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > > Sent: Tuesday, December 06, 2016 9:25 PM > > To: Viktor Nosov > > Cc: gluster-users@xxxxxxxxxxx > > Subject: Re: Geo-replication failed to delete from > > slave file partially written to master volume. > > > > Hi Viktor, > > > > Please share geo-replication-slave mount logs from slave nodes. > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "Viktor Nosov" <vnosov@xxxxxxxxxxxx> > > > To: gluster-users@xxxxxxxxxxx > > > Cc: vnosov@xxxxxxxxxxxx > > > Sent: Tuesday, December 6, 2016 7:13:22 AM > > > Subject: Geo-replication failed to delete from slave file > > > partially written to master volume. > > > > > > Hi, > > > > > > I hit problem while testing geo-replication. Anybody knows how to > > > fix it except deleting and recreating geo-replication? > > > > > > Geo-replication failed to delete from slave file partially written > > > to master volume. > > > > > > Have geo-replication between two nodes that are running glusterfs > > > 3.7.16 > > > > > > with master volume: > > > > > > [root@SC-182 log]# gluster volume info master-for-183-0003 > > > > > > Volume Name: master-for-183-0003 > > > Type: Distribute > > > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e > > > Status: Started > > > Number of Bricks: 1 > > > Transport-type: tcp > > > Bricks: > > > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 > > > Options Reconfigured: > > > changelog.changelog: on > > > geo-replication.ignore-pid-check: on > > > geo-replication.indexing: on > > > server.allow-insecure: on > > > performance.quick-read: off > > > performance.stat-prefetch: off > > > nfs.disable: on > > > nfs.addr-namelookup: off > > > performance.readdir-ahead: on > > > cluster.enable-shared-storage: enable > > > snap-activate-on-create: enable > > > > > > and slave volume: > > > > > > [root@SC-183 log]# gluster volume info rem-volume-0001 > > > > > > Volume Name: rem-volume-0001 > > > Type: Distribute > > > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c > > > Status: Started > > > Number of Bricks: 1 > > > Transport-type: tcp > > > Bricks: > > > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 > > > Options Reconfigured: > > > performance.readdir-ahead: on > > > nfs.addr-namelookup: off > > > nfs.disable: on > > > performance.stat-prefetch: off > > > performance.quick-read: off > > > server.allow-insecure: on > > > snap-activate-on-create: enable > > > > > > Master volume mounted on node: > > > > > > [root@SC-182 log]# mount > > > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type > > > fuse.glusterfs (rw,allow_other,max_read=131072) > > > > > > Let's fill up space on master volume: > > > > > > [root@SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 > > > [root@SC-182 log]# cp big.file > > > /samba/master-for-183-0003/cifs_share/dir3/ > > > [root@SC-182 log]# cp big.file > > > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > > cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > > > No space left on device > > > cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > > > No space left on device > > > > > > File " big.file.1" represent part of the original file: > > > [root@SC-182 log]# ls -l > > > /samba/master-for-183-0003/cifs_share/dir3/* > > > -rwx------ 1 root root 78930370 Dec 5 16:49 > > > /samba/master-for-183-0003/cifs_share/dir3/big.file > > > -rwx------ 1 root root 22155264 Dec 5 16:49 > > > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > > > > > Both new files are geo-replicated to the Slave volume successfully: > > > > > > [root@SC-183 log]# ls -l > > > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > > > total 98720 > > > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > > > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > > > > > [root@SC-182 log]# /usr/sbin/gluster volume geo-replication > > > master-for-183-0003 nasgorep@10.10.60.183::rem-volume-0001 status > > > detail > > > > > > MASTER NODE MASTER VOL MASTER BRICK > > > SLAVE USER SLAVE SLAVE NODE > > > STATUS > > > CRAWL STATUS LAST_SYNCED ENTRY DATA META > > > FAILURES > > > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > ------ > > > 10.10.60.182 master-for-183-0003 > > > /exports/nas-segment-0012/master-for-183-0003 nasgorep > > > nasgorep@10.10.60.183::rem-volume-0001 10.10.60.183 Active > > > Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 > > > N/A N/A N/A > > > > > > Let's delete partially written file from the master mount: > > > > > > [root@SC-182 log]# rm > > > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > > rm: remove regular file > > > `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y > > > > > > [root@SC-182 log]# ls -l > > > /samba/master-for-183-0003/cifs_share/dir3/* > > > -rwx------ 1 root root 78930370 Dec 5 16:49 > > > /samba/master-for-183-0003/cifs_share/dir3/big.file > > > > > > Set checkpoint: > > > > > > 32643 12/05/2016 16:57:46.540390536 1480985866 command: > > > /usr/sbin/gluster volume geo-replication master-for-183-0003 > > > nasgorep@10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > > 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 > > > /usr/sbin/gluster volume geo-replication master-for-183-0003 > > > nasgorep@10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > > > > > Check geo-replication status: > > > > > > [root@SC-182 log]# /usr/sbin/gluster volume geo-replication > > > master-for-183-0003 nasgorep@10.10.60.183::rem-volume-0001 status > > > detail > > > > > > MASTER NODE MASTER VOL MASTER BRICK > > > SLAVE USER SLAVE SLAVE NODE > > > STATUS > > > CRAWL STATUS LAST_SYNCED ENTRY DATA META > > > FAILURES > > > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > -------------------------------------------------------------------- > > > -- > > > ------ > > > ---------- > > > 10.10.60.182 master-for-183-0003 > > > /exports/nas-segment-0012/master-for-183-0003 nasgorep > > > nasgorep@10.10.60.183::rem-volume-0001 10.10.60.183 Active > > > Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 > > > 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 > > > > > > But the partially written file "big.file.1" is still present on the > > > slave > > > volume: > > > > > > [root@SC-183 log]# ls -l > > > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > > > total 98720 > > > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > > > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > > > > > Gluster logs for geo-replication do not have any indication about > > > failure to delete the file: > > > > > > [root@SC-182 log]# view > > > /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fn > > > as > > > gorep% > > > 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > > > > > [2016-12-06 00:49:40.267956] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 17 crawls, 1 turns > > > [2016-12-06 00:49:52.348413] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] > > > _GMaster: > > > slave's time: (1480985358, 0) > > > [2016-12-06 00:49:53.296811] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:53.901186] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:54.760957] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:55.384705] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:55.987873] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:56.848361] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:57.471925] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:58.76416] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:58.935801] W > > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > > [2016-12-06 00:49:59.560571] E > > > [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: > > > SYNC Error(Rsync): rsync: rsync_xal_set: > > > lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.glus > > > te > > > rfs.84 > > > 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not > > > permitted > > > (1) > > > [2016-12-06 00:49:59.560972] E > > > [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] > > > _GMaster: changelogs CHANGELOG.1480985389 could not be processed > > > completely - moving on... > > > [2016-12-06 00:50:41.839792] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 18 crawls, 1 turns > > > [2016-12-06 00:51:42.203411] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:52:42.600800] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:53:42.983913] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:54:43.381218] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:55:43.749927] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:56:44.113914] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:57:44.494354] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: > > > checkpoint > > > 1480985866 set > > > [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: > > > exiting. > > > [2016-12-06 00:57:50.530714] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] > > > _GMaster: > > > slave's time: (1480985388, 0) > > > [2016-12-06 00:58:44.802122] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 1 turns > > > [2016-12-06 00:59:45.181669] I > > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap > > > ] > > > _GMaster: 20 crawls, 0 turns > > > > > > Best regards, > > > > > > Viktor Nosov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users