Hi Milind, Thanks for the suggestion, I did that for a few problematic files and it seems to continue but now I am stuck at the following error message on the slave: [2016-02-22 15:21:30.451133] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-myvolume-geo-client-0: remote operation failed. Path: <gfid:c4b19f1c-cc18-4727-87a4-18de8fe0089e> (c4b19f1c-cc18-4727-87a4-18de8fe0089e) [No such file or directory] As you can see this message does not include any file or directory name, so I can't go any delete that file or directory. Any other ideas how I may proceed here? Or maybe would it be easier if I delete the whole directory which I think is affected and start geo-rep from there? Or will this mess things up? Regards ML On Monday, February 22, 2016 12:12 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote: ML, You could try deleting problematic files on slave to recover geo-replication from Faulty state. However, changelogs generated due to logrotate scenario will still cause geo-replication to go into Faulty state frequently if geo-replication fails and restarts. The patches mentioned in an earlier mail are being worked upon and finalized. They will be available soon in a release which will avoid geo-replication going into a Faulty state. -- Milind ----- Original Message ----- From: "ML mail" <mlnospam@xxxxxxxxx> To: "Milind Changire" <mchangir@xxxxxxxxxx>, "Gluster-users" <gluster-users@xxxxxxxxxxx> Sent: Monday, February 22, 2016 1:27:14 PM Subject: Re: geo-rep: remote operation failed - No such file or directory Hi Milind, Any news on this issue? I was wondering how can I fix and restart my geo-replication? Can I simply delete the problematic file(s) on my slave and restart geo-rep? Regards ML On Wednesday, February 17, 2016 4:30 PM, ML mail <mlnospam@xxxxxxxxx> wrote: Hi Milind, Thank you for your short analysis. Indeed that's exactly what happens, as soon as I restart geo-rep it replays the same over and over as it does not succeed. Now regarding the sequence of the file management operations I am not totally sure how it works but I can tell you that we are using ownCloud v8.2.2 (www.owncloud.org) and as storage for this cloud software we use GlusterFS. So it is very probable that ownCloud works like that: when a user uploads a new file if first creates it with another temporary name which it then either renames or moves after successful upload. I have the feeling this issue is related to my initial issue which I have reported earlier this month: https://www.gluster.org/pipermail/gluster-users/2016-February/025176.html For now my question would be how do I get to restart geo-replication succesfully? Regards ML On Wednesday, February 17, 2016 4:10 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote: As per the slave logs, there is an attempt to RENAME files i.e. a .part file getting renamed to a name without the .part suffix Just restarting geo-rep isn't going to help much if you've already hit the problem. Since the last CHANGELOG is replayed by geo-rep on a restart, you'll most probably encounter the same log messages in the logs. Are the .part files CREATEd, RENAMEd and DELETEd with the same name often? Are the operations somewhat in the following sequence that happen on the geo-replication master cluster? CREATE f1.part RENAME f1.part f1 DELETE f1 CREATE f1.part RENAME f1.part f1 ... ... If not, then it would help if you could send the sequence of file management operations. -- Milind ----- Original Message ----- From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> To: "ML mail" <mlnospam@xxxxxxxxx> Cc: "Gluster-users" <gluster-users@xxxxxxxxxxx>, "Milind Changire" <mchangir@xxxxxxxxxx> Sent: Tuesday, February 16, 2016 6:28:21 PM Subject: Re: geo-rep: remote operation failed - No such file or directory Ccing Milind, he would be able to help Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "ML mail" <mlnospam@xxxxxxxxx> > To: "Gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Monday, February 15, 2016 4:41:56 PM > Subject: geo-rep: remote operation failed - No such file or directory > > Hello, > > I noticed that the geo-replication of a volume has STATUS "Faulty" and while > looking in the *.gluster.log file in > /var/log/glusterfs/geo-replication-slaves/ on my slave I can see the > following relevant problem: > > [2016-02-15 10:58:40.402516] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > 0-myvolume-geo-client-0: changing port to 49152 (from 0) > [2016-02-15 10:58:40.403928] I [MSGID: 114057] > [client-handshake.c:1437:select_server_supported_programs] > 0-myvolume-geo-client-0: Using Program GlusterFS 3.3, Num (1298437), Version > (330) > [2016-02-15 10:58:40.404130] I [MSGID: 114046] > [client-handshake.c:1213:client_setvolume_cbk] 0-myvolume-geo-client-0: > Connected to myvolume-geo-client-0, attached to remote volume > '/data/myvolume-geo/brick'. > [2016-02-15 10:58:40.404150] I [MSGID: 114047] > [client-handshake.c:1224:client_setvolume_cbk] 0-myvolume-geo-client-0: > Server and Client lk-version numbers are not same, reopening the fds > [2016-02-15 10:58:40.410150] I [fuse-bridge.c:5137:fuse_graph_setup] 0-fuse: > switched to graph 0 > [2016-02-15 10:58:40.410223] I [MSGID: 114035] > [client-handshake.c:193:client_set_lk_version_cbk] 0-myvolume-geo-client-0: > Server lk version = 1 > [2016-02-15 10:58:40.410370] I [fuse-bridge.c:4030:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel > 7.23 > [2016-02-15 10:58:45.662416] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] > 0-myvolume-geo-dht: renaming > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-0.FpKL3SIUb9vKHyjd.part > (hash=myvolume-geo-client-0/cache=myvolume-geo-client-0) => > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-0 > (hash=myvolume-geo-client-0/cache=<nul>) > [2016-02-15 10:58:45.665144] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] > 0-myvolume-geo-dht: renaming > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-1.C6l0DEurb2y3Azw4.part > (hash=myvolume-geo-client-0/cache=myvolume-geo-client-0) => > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-1 > (hash=myvolume-geo-client-0/cache=<nul>) > [2016-02-15 10:58:45.749829] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] > 0-myvolume-geo-dht: renaming > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0.ajEnSguUZ7EkzjzT.part > (hash=myvolume-geo-client-0/cache=myvolume-geo-client-0) => > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0 > (hash=myvolume-geo-client-0/cache=<nul>) > [2016-02-15 10:58:45.750225] W [MSGID: 114031] > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-myvolume-geo-client-0: > remote operation failed. Path: > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0.ajEnSguUZ7EkzjzT.part > (9164caeb-740d-4429-a3bd-c85f40c35e11) [No such file or directory] > [2016-02-15 10:58:45.750418] W [fuse-bridge.c:1777:fuse_rename_cbk] > 0-glusterfs-fuse: 60: > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0.ajEnSguUZ7EkzjzT.part > -> > /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0 > => -1 (Device or resource busy) > [2016-02-15 10:58:45.767788] I [fuse-bridge.c:4984:fuse_thread_proc] 0-fuse: > unmounting /tmp/gsyncd-aux-mount-bZ9SMt > [2016-02-15 10:58:45.768063] W [glusterfsd.c:1236:cleanup_and_exit] > (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7feb610820a4] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7feb626f45b5] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x59) [0x7feb626f4429] ) 0-: > received signum (15), shutting down > [2016-02-15 10:58:45.768093] I [fuse-bridge.c:5683:fini] 0-fuse: Unmounting > '/tmp/gsyncd-aux-mount-bZ9SMt'. > [2016-02-15 10:58:54.871855] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(posix_acl_setxattr_cbk+0x26) > [0x7f8313dfb166] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(handling_other_acl_related_xattr+0x20) > [0x7f8313dfb060] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f831f3f40c3] ) 0-dict: !this || key=system.posix_acl_access [Invalid > argument] > [2016-02-15 10:58:54.871914] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(posix_acl_setxattr_cbk+0x26) > [0x7f8313dfb166] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(handling_other_acl_related_xattr+0xb0) > [0x7f8313dfb0f0] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f831f3f40c3] ) 0-dict: !this || key=system.posix_acl_default [Invalid > argument] > > This error gets repeated forever with always the same files. I tried to stop > and restart the geo-rep on the master but still the same problem and geo > replication does not proceed. Does anyone have an idea how to fix this? > > I am using GlusterFS 3.7.6 on Debian 8 with a two node replicate volume (1 > brick per node) and one single off-site slave node for geo-rep. > > Regards > ML > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users