That would be great thank you. For me it is not an option to delete the volume on my master node (2 nodes, 1 brick per node). On the other hand no problem to delete the volume on the slave node which is only used for geo-rep. Regards ML On Wednesday, February 24, 2016 4:44 PM, Aravinda <avishwan@xxxxxxxxxx> wrote: We can provide workaround steps to resync from beginning without deleting Volume(s). I will send the Session reset details by tomorrow. regards Aravinda On 02/24/2016 09:08 PM, ML mail wrote: > That's right I saw already a few error messages mentioning "Device or resource busy" and was wondering what it was... > > You mean I have to delete the brick on my slave node, delete the volume on my slave node and finally re-create the volume on my slave node in order to start geo-replication from the beginning again? I do not have to touch or delete anything on the master node, right? > > > Regards > ML > > > > On Wednesday, February 24, 2016 3:07 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote: > ML, > Since the fixes to geo-rep are yet to get into a release, > I can only suggest you to be a bit patient. > Also, since you are using logrotate to rotate logs, you > will most likely get into the "No such file or directory" > or "Device or resource busy" scenario on the slave again. > I'm not saying logrotate is at fault, I'm just saying that > that specific use case leads to an inconsistent gluster > state. > > Unfortunately, you cannot selectively purge the changelogs. > You will have to delete the volume and empty the bricks > and recreate the volume with the empty bricks to start > all over again. > > You can delete the volume with: > # gluster volume stop <volume name> > # gluster volume delete <volume name> > > -- > Milind > > > ----- Original Message ----- > From: "ML mail" <mlnospam@xxxxxxxxx> > To: "Milind Changire" <mchangir@xxxxxxxxxx> > Cc: "Gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Wednesday, February 24, 2016 4:44:27 PM > Subject: Re: geo-rep: remote operation failed - No such file or directory > > Thanks Milind again for your help. I understand now the concept and managed to set the required attribute for forcing the resyncing. That worked but unfortunately it is a never ending story, I fix stuff, start geo-rep it goes for a few more files and fails again. > > Now I think it will be easier to reset geo-replication and start from scratch again, luckily my volume is only 16 GB big as I am still experimenting. What would be the correct way to reset geo-rep? I don't want to remove the config but I would like to trash all the changelogs, delete the whole data on the slave and re-start geo-rep. How should I proceed? > > Regards > ML > > > > > On Wednesday, February 24, 2016 10:14 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote: > 1. You could use the script at > https://gist.github.com/aravindavk/afb16813261794faa432 > to create a path from the gfid that you could cd to > i.e. for gfid c4b19f1c-cc18-4727-87a4-18de8fe0089e > > 2. yes, you have to recursively set the virtual xattr > on all entries in the directory tree > Also, remember to set a value as well > # setfattr -n glusterfs.geo-rep.trigger-sync -v 1 <file-path> > > Also, remember to set the virtual xattr via the volume > mount path and not the brick back-end path. > You should have geo-replication stopped when you are > setting the virtual xattr and start it when you are > done setting the xattr for the entire directory tree. > > -- > Milind > > > ----- Original Message ----- > From: "ML mail" <mlnospam@xxxxxxxxx> > To: "Milind Changire" <mchangir@xxxxxxxxxx> > Cc: "Gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Wednesday, February 24, 2016 1:46:11 PM > Subject: Re: geo-rep: remote operation failed - No such file or directory > > Thank you for explaining me how the symbolic linking works in the the .glusterfs directory. Now regarding your new instructions I have two questions: > > 1) How can I find out which directory "OC_DEFAULT_MODULE" on my master brick I should run the > setfattr command on? My problem here is that there are a lot of OC_DEFAULT_MODULE directories on my brick not just only a single one. > > > > 2) If I understand your last paragraph correctly, you want me to locate the correct OC_DEFAULT_MODULE directory and recursively use setfattr on each sub-directories and/or files inside that directory, is this correct? > > Regards > ML > > > > On Wednesday, February 24, 2016 7:29 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote: > ML, > You just need to worry about the very first entry that you found with > the find command: > > $ find .glusterfs -name c4b19f1c-cc18-4727-87a4-18de8fe0089e -ls > 228215 0 lrwxrwxrwx 1 root root 66 Feb 19 08:52 .glusterfs/c4/b1/c4b19f1c-cc18-4727-87a4-18de8fe0089e -> ../../92/1b/921bfe8e-81ef-4579-b335-abfa2c7e6afb/OC_DEFAULT_MODULE > > Since the back-end entry is a symlink, it means that OC_DEFAULT_MODULE > is a directory on the master and it is missing on the slave. > If you try to recursively look at the parent gfids of each of the entries > then they will always point to symlinks since a directory is always > represented as a symlink at the glusterfs back-end, and you will follow > them up to the ROOT gfid. > > ----- > > Now, to get the OC_DEFAULT_MODULE directory replicated on the slave, > you will have to set the virtual xattr on the entire directory tree > in pre-order listing i.e. set the virtual xattr on the directory > starting at OC_DEFAULT_MODULE and then on the entries inside the > directory, and so on down the directory tree. > > -- > Milind > > > ----- Original Message ----- > From: "ML mail" <mlnospam@xxxxxxxxx> > To: "Milind Changire" <mchangir@xxxxxxxxxx> > Cc: "Gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Wednesday, February 24, 2016 12:25:26 AM > Subject: Re: geo-rep: remote operation failed - No such file or directory > > Hi Milind, > > Thanks for the instructions for forcing the data sync of a specific file. I was not able to do that as I have discovered something even more weird by trying to find out the concerned file by GFID with the find command as you suggested. Indeed it looks like I have a symbolic link pointing to another one and then to another and so on, as you can see below: > > $ find .glusterfs -name c4b19f1c-cc18-4727-87a4-18de8fe0089e -ls > 228215 0 lrwxrwxrwx 1 root root 66 Feb 19 08:52 .glusterfs/c4/b1/c4b19f1c-cc18-4727-87a4-18de8fe0089e -> ../../92/1b/921bfe8e-81ef-4579-b335-abfa2c7e6afb/OC_DEFAULT_MODULE > > $ ls -la 92/1b/921bfe8e-81ef-4579-b335-abfa2c7e6afb > lrwxrwxrwx 1 root root 79 Feb 19 08:52 92/1b/921bfe8e-81ef-4579-b335-abfa2c7e6afb -> ../../d7/9f/d79f2ebd-029c-4ac5-8074-5eef7ff21236/160201_File_1602_XX.xls > > > $ ls -la d7/9f/d79f2ebd-029c-4ac5-8074-5eef7ff21236 > lrwxrwxrwx 1 root root 53 Feb 15 07:34 d7/9f/d79f2ebd-029c-4ac5-8074-5eef7ff21236 -> ../../fd/ea/fdea1fc6-0f2a-43d2-8776-651cc6ea73e8/1602 > > > $ ls -la fd/ea/fdea1fc6-0f2a-43d2-8776-651cc6ea73e8 > lrwxrwxrwx 1 root root 55 Feb 15 07:29 fd/ea/fdea1fc6-0f2a-43d2-8776-651cc6ea73e8 -> ../../20/25/20253364-add8-4149-a7cf-cf46d237a45c/Banana > > > Is this normal? I somehow don't understand this weird structure of never ending symbolic links... or am I missing something? > > > Regards > ML > > > > On Tuesday, February 23, 2016 6:31 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote: > ML, > You will have to search for the gfid c4b19f1c-cc18-4727-87a4-18de8fe0089e > at the master cluster brick back-ends and run the following command for > that specific file on the master cluster to force triggering a data sync [1] > > # setfattr -n glusterfs.geo-rep.trigger-sync <file-path> > > To search for the file at the brick back-end: > > # find /<path-to-brick>/.glusterfs -name c4b19f1c-cc18-4727-87a4-18de8fe0089e > > Once path to the file is found at any of the bricks, you can then use > the setfattr command described above. > > Reference: > [1] feature/changelog: Virtual xattr to trigger explicit sync in geo-rep > http://review.gluster.org/#/c/9337/ > -- > Milind > > > ----- Original Message ----- > From: "ML mail" <mlnospam@xxxxxxxxx> > To: "Milind Changire" <mchangir@xxxxxxxxxx> > Cc: "Gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Monday, February 22, 2016 9:10:56 PM > Subject: Re: geo-rep: remote operation failed - No such file or directory > > Hi Milind, > > Thanks for the suggestion, I did that for a few problematic files and it seems to continue but now I am stuck at the following error message on the slave: > > [2016-02-22 15:21:30.451133] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-myvolume-geo-client-0: remote operation failed. Path: <gfid:c4b19f1c-cc18-4727-87a4-18de8fe0089e> (c4b19f1c-cc18-4727-87a4-18de8fe0089e) [No such file or directory] > > As you can see this message does not include any file or directory name, so I can't go any delete that file or directory. Any other ideas how I may proceed here? > > Or maybe would it be easier if I delete the whole directory which I think is affected and start geo-rep from there? Or will this mess things up? > > Regards > ML > > > > On Monday, February 22, 2016 12:12 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote: > ML, > You could try deleting problematic files on slave to recover geo-replication > from Faulty state. > > However, changelogs generated due to logrotate scenario will still cause > geo-replication to go into Faulty state frequently if geo-replication > fails and restarts. > > The patches mentioned in an earlier mail are being worked upon and finalized. > They will be available soon in a release which will avoid geo-replication > going into a Faulty state. > > -- > Milind > > > ----- Original Message ----- > From: "ML mail" <mlnospam@xxxxxxxxx> > To: "Milind Changire" <mchangir@xxxxxxxxxx>, "Gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Monday, February 22, 2016 1:27:14 PM > Subject: Re: geo-rep: remote operation failed - No such file or directory > > Hi Milind, > > Any news on this issue? I was wondering how can I fix and restart my geo-replication? Can I simply delete the problematic file(s) on my slave and restart geo-rep? > > Regards > ML > > > > > > On Wednesday, February 17, 2016 4:30 PM, ML mail <mlnospam@xxxxxxxxx> wrote: > > > Hi Milind, > > Thank you for your short analysis. Indeed that's exactly what happens, as soon as I restart geo-rep it replays the same over and over as it does not succeed. > > > Now regarding the sequence of the file management operations I am not totally sure how it works but I can tell you that we are using ownCloud v8.2.2 (www.owncloud.org) and as storage for this cloud software we use GlusterFS. So it is very probable that ownCloud works like that: when a user uploads a new file if first creates it with another temporary name which it then either renames or moves after successful upload. > > > I have the feeling this issue is related to my initial issue which I have reported earlier this month: > https://www.gluster.org/pipermail/gluster-users/2016-February/025176.html > > For now my question would be how do I get to restart geo-replication succesfully? > > Regards > ML > > > > On Wednesday, February 17, 2016 4:10 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote: > > > As per the slave logs, there is an attempt to RENAME files > i.e. a .part file getting renamed to a name without the > .part suffix > > Just restarting geo-rep isn't going to help much if > you've already hit the problem. Since the last CHANGELOG > is replayed by geo-rep on a restart, you'll most probably > encounter the same log messages in the logs. > > Are the .part files CREATEd, RENAMEd and DELETEd with the > same name often? Are the operations somewhat in the following > sequence that happen on the geo-replication master cluster? > > CREATE f1.part > RENAME f1.part f1 > DELETE f1 > CREATE f1.part > RENAME f1.part f1 > ... > ... > > > If not, then it would help if you could send the sequence > of file management operations. > > -- > Milind > > > ----- Original Message ----- > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > To: "ML mail" <mlnospam@xxxxxxxxx> > Cc: "Gluster-users" <gluster-users@xxxxxxxxxxx>, "Milind Changire" <mchangir@xxxxxxxxxx> > Sent: Tuesday, February 16, 2016 6:28:21 PM > Subject: Re: geo-rep: remote operation failed - No such file or directory > > Ccing Milind, he would be able to help > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- >> From: "ML mail" <mlnospam@xxxxxxxxx> >> To: "Gluster-users" <gluster-users@xxxxxxxxxxx> >> Sent: Monday, February 15, 2016 4:41:56 PM >> Subject: geo-rep: remote operation failed - No such file or directory >> >> Hello, >> >> I noticed that the geo-replication of a volume has STATUS "Faulty" and while >> looking in the *.gluster.log file in >> /var/log/glusterfs/geo-replication-slaves/ on my slave I can see the >> following relevant problem: >> >> [2016-02-15 10:58:40.402516] I [rpc-clnt.c:1847:rpc_clnt_reconfig] >> 0-myvolume-geo-client-0: changing port to 49152 (from 0) >> [2016-02-15 10:58:40.403928] I [MSGID: 114057] >> [client-handshake.c:1437:select_server_supported_programs] >> 0-myvolume-geo-client-0: Using Program GlusterFS 3.3, Num (1298437), Version >> (330) >> [2016-02-15 10:58:40.404130] I [MSGID: 114046] >> [client-handshake.c:1213:client_setvolume_cbk] 0-myvolume-geo-client-0: >> Connected to myvolume-geo-client-0, attached to remote volume >> '/data/myvolume-geo/brick'. >> [2016-02-15 10:58:40.404150] I [MSGID: 114047] >> [client-handshake.c:1224:client_setvolume_cbk] 0-myvolume-geo-client-0: >> Server and Client lk-version numbers are not same, reopening the fds >> [2016-02-15 10:58:40.410150] I [fuse-bridge.c:5137:fuse_graph_setup] 0-fuse: >> switched to graph 0 >> [2016-02-15 10:58:40.410223] I [MSGID: 114035] >> [client-handshake.c:193:client_set_lk_version_cbk] 0-myvolume-geo-client-0: >> Server lk version = 1 >> [2016-02-15 10:58:40.410370] I [fuse-bridge.c:4030:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel >> 7.23 >> [2016-02-15 10:58:45.662416] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] >> 0-myvolume-geo-dht: renaming >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-0.FpKL3SIUb9vKHyjd.part >> (hash=myvolume-geo-client-0/cache=myvolume-geo-client-0) => >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-0 >> (hash=myvolume-geo-client-0/cache=<nul>) >> [2016-02-15 10:58:45.665144] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] >> 0-myvolume-geo-dht: renaming >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-1.C6l0DEurb2y3Azw4.part >> (hash=myvolume-geo-client-0/cache=myvolume-geo-client-0) => >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_03_Rosen.JPG-chunking-2242590604-1 >> (hash=myvolume-geo-client-0/cache=<nul>) >> [2016-02-15 10:58:45.749829] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] >> 0-myvolume-geo-dht: renaming >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0.ajEnSguUZ7EkzjzT.part >> (hash=myvolume-geo-client-0/cache=myvolume-geo-client-0) => >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0 >> (hash=myvolume-geo-client-0/cache=<nul>) >> [2016-02-15 10:58:45.750225] W [MSGID: 114031] >> [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-myvolume-geo-client-0: >> remote operation failed. Path: >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0.ajEnSguUZ7EkzjzT.part >> (9164caeb-740d-4429-a3bd-c85f40c35e11) [No such file or directory] >> [2016-02-15 10:58:45.750418] W [fuse-bridge.c:1777:fuse_rename_cbk] >> 0-glusterfs-fuse: 60: >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0.ajEnSguUZ7EkzjzT.part >> -> >> /.gfid/94310944-7f8a-421d-a51f-1e23e28da9cc/Bild_02_Pilz.JPG-chunking-628343631-0 >> => -1 (Device or resource busy) >> [2016-02-15 10:58:45.767788] I [fuse-bridge.c:4984:fuse_thread_proc] 0-fuse: >> unmounting /tmp/gsyncd-aux-mount-bZ9SMt >> [2016-02-15 10:58:45.768063] W [glusterfsd.c:1236:cleanup_and_exit] >> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7feb610820a4] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7feb626f45b5] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x59) [0x7feb626f4429] ) 0-: >> received signum (15), shutting down >> [2016-02-15 10:58:45.768093] I [fuse-bridge.c:5683:fini] 0-fuse: Unmounting >> '/tmp/gsyncd-aux-mount-bZ9SMt'. >> [2016-02-15 10:58:54.871855] I [dict.c:473:dict_get] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(posix_acl_setxattr_cbk+0x26) >> [0x7f8313dfb166] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(handling_other_acl_related_xattr+0x20) >> [0x7f8313dfb060] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) >> [0x7f831f3f40c3] ) 0-dict: !this || key=system.posix_acl_access [Invalid >> argument] >> [2016-02-15 10:58:54.871914] I [dict.c:473:dict_get] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(posix_acl_setxattr_cbk+0x26) >> [0x7f8313dfb166] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/system/posix-acl.so(handling_other_acl_related_xattr+0xb0) >> [0x7f8313dfb0f0] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) >> [0x7f831f3f40c3] ) 0-dict: !this || key=system.posix_acl_default [Invalid >> argument] >> >> This error gets repeated forever with always the same files. I tried to stop >> and restart the geo-rep on the master but still the same problem and geo >> replication does not proceed. Does anyone have an idea how to fix this? >> >> I am using GlusterFS 3.7.6 on Debian 8 with a two node replicate volume (1 >> brick per node) and one single off-site slave node for geo-rep. >> >> Regards >> ML >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://www.gluster.org/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users