Hi Cyril, Could you please attach the geo-replication logs? Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" <cyril.peponnet@xxxxxxxxxxxxxxxxxx> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > Cc: "gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Monday, June 1, 2015 10:34:42 PM > Subject: Re: Geo-Replication - Changelog socket is not present - Falling back to xsync > > Some news, > > Looks like changelog is not working anymore. When I touch a file in master it > doesnt propagate to slave… > > .processing folder contain a thousand of changelog not processed. > > I had to stop the geo-rep, reset changelog.changelog to the volume and > restart the geo-rep. It’s now sending missing files using hybrid crawl. > > So geo-repo is not working as expected. > > Another thing, we use symlink to point to latest release build, and it seems > that symlinks are not synced when they change from master to slave. > > Any idea on how I can debug this ? > > -- > Cyril Peponnet > > On May 29, 2015, at 3:01 AM, Kotresh Hiremath Ravishankar > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx>> wrote: > > Yes, geo-rep internally uses fuse mount. > I will explore further and get back to you > if there is a way. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > To: "Kotresh Hiremath Ravishankar" > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx>> > Cc: "gluster-users" > <gluster-users@xxxxxxxxxxx<mailto:gluster-users@xxxxxxxxxxx>> > Sent: Thursday, May 28, 2015 10:12:57 PM > Subject: Re: Geo-Replication - Changelog socket is not > present - Falling back to xsync > > One more thing: > > nfs.volume-access read-only works only for nfs clients, glusterfs client have > still write access > > features.read-only on need a vol restart and set RO for everyone but in this > case, geo-rep goes faulty. > > [2015-05-28 09:42:27.917897] E [repce(/export/raid/usr_global):188:__call__] > RepceClient: call 8739:139858642609920:1432831347.73 (keep_alive) failed on > peer with OSError > [2015-05-28 09:42:27.918102] E > [syncdutils(/export/raid/usr_global):240:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 266, in > twrap > tf(*aa) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 391, in > keep_alive > cls.slave.server.keep_alive(vi) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in > __call__ > return self.ins(self.meth, *a) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in > __call__ > raise res > OSError: [Errno 30] Read- > > So there is no proper way to protect the salve against write. > > -- > Cyril Peponnet > > On May 28, 2015, at 8:54 AM, Cyril Peponnet > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > wrote: > > Hi Kotresh, > > Inline. > > Again, thank for you time. > > -- > Cyril Peponnet > > On May 27, 2015, at 10:47 PM, Kotresh Hiremath Ravishankar > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > wrote: > > Hi Cyril, > > Replies inline. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > To: "Kotresh Hiremath Ravishankar" > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > Cc: "gluster-users" > <gluster-users@xxxxxxxxxxx<mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx>> > Sent: Wednesday, May 27, 2015 9:28:00 PM > Subject: Re: Geo-Replication - Changelog socket is not > present - Falling back to xsync > > Hi and thanks again for those explanation. > > Due to lot of missing files and not up to date (with gfid mismatch some > time), I reset the index (or I think I do) by: > > deleting the geo-reop, reset geo-replication.indexing (set it to off does not > work for me), and recreate it again. > > Resetting index does not initiate geo-replication from the version changelog > is > introduced. It works only for the versions prior to it. > > NOTE 1: Recreation of geo-rep session will work only if slave doesn't contain > file with mismatch gfids. If there are, slave should be cleaned up > before recreating. > > I started it again to transfert missing files Ill take of gfid missmatch > afterward. Our vol is almost 5TB and it took almost 2 month to crawl to the > slave I did’nt want to start over :/ > > > NOTE 2: Another method exists now to initiate a full sync. It also expects > slave > files should not be in gfid mismatch state (meaning, slave volume > should not > written by any other means other than geo-replication). The method is > to > reset stime on all the bricks of master. > > > Following are the steps to trigger full sync!!!. Let me know if any > comments/doubts. > ================================================ > 1. Stop geo-replication > 2. Remove stime extended attribute all the master brick root using > following command. > setfattr -x > trusted.glusterfs.<MASTER_VOL_UUID>.<SLAVE_VOL_UUID>.stime > <brick-root> > NOTE: 1. If AFR is setup, do this for all replicated set > > 2. Above mentioned stime key can be got as follows: > Using 'gluster volume info <mastervol>', get all brick > paths and dump all the > extended attributes, using 'getfattr -d -m . -e hex > <brick-path>', which will > dump stime key which should be removed. > > 3. The technique, re-triggers complete sync. It involves > complete xsync crawl. > If there are rename issues, it might hit the rsync error > on complete re-sync as well. > So it is recommended, if the problematic files on slaves > are known, remove them and initiate > complete sync. > > Is complete sync will send again the data if present of not ? How to track > down rename issue ? master is a living volume with lot of creation / rename > / deletion. > > > 3. Start geo-replicatoin. > > The above technique can also be used to trigger data sync only on one > particular brick. > Just removing stime extended attribute only on brick root of master > to be synced will > do. If AFR is setup, remove stime on all replicated set of bricks. > > ================================ > > > So for now it’s still in hybrid crawl process. > > I end up with that because some entire folder where not synced up by the > first hybrid crawl (and touch does nothing afterward in changelog). In fact > touch anyfile doesnt trigger any resync, only delete/rename/change do. > > > In newer geo-replication, from the version history crawl is introduced, > xsync > crawl is minimized. Once it reaches the timestamp where it gets the > historical changelogs, > it starts using history changelogs. Touch will be recorded as SETATTR in > Changelog so > Geo-rep will not sync the data. So the new virtual setattr interface is > introduced > which is mentioned in previous mail. > > 1/ > 1. Directories: > #setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <DIR> > 2. Files: > #setfattr -n glusterfs.geo-rep.trigger-sync -v “1" <file-path> > > Is is recursive ? (for directories) or I have to do that on each mismatching > files ? Should I do that on master or slave ? > > > No, it is not recursive, it should be done for every missing files and > directories. > And directories should be done before the files inside it. > It should be done on master. > > > I don’t understand the difference between setfattr -n > glusterfs.geo-rep.trigger-sync -v “1” <DIR> (vol level) and setfattr -x > trusted.glusterfs.<MASTER_VOL_UUID>.<SLAVE_VOL_UUID>.stime <brick-root> > (brick level) > > > 2/ For the RO I can pass the Option: nfs.volume-access to read-only, this > will pass the vol in RO for nfs mount and glusterfs mount. Correct ? > > Yes, that should do. > > Cool ! Thanks! > > > Thank you so much for your help. > -- > Cyril Peponnet > > On May 26, 2015, at 11:29 PM, Kotresh Hiremath Ravishankar > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > wrote: > > Hi Cyril, > > Need some clarifications. Comments inline. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > To: "Kotresh Hiremath Ravishankar" > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > Cc: "gluster-users" > <gluster-users@xxxxxxxxxxx<mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx>> > Sent: Tuesday, May 26, 2015 11:43:44 PM > Subject: Re: Geo-Replication - Changelog socket is not > present - Falling back to xsync > > So, changelog is still active but I notice that some file were missing. > > So I ‘m running a rsync -avn between the two vol (master and slave) to > sync > then again by touching the missing files (hopping geo-rep will do the > rest). > > Are you running rsync -avn for missed files between master and slave > volumes ? > If yes, that is dangerous and it should not be done. Geo-replication > demands gfid > of files between master and slave to be intact (meaning the gfid of > 'file1' in > master vol should be same as 'file1' in slave). It is required because, > the data sync > happens using 'gfid' not the 'pathname' of the file. So if manual rsync is > used > to sync files between master and slave using pathname, gfids will change > and > further syncing on those files fails through geo-rep. > > A virtual setxattr interface is provided to sync missing files through > geo-replication. > It makes sure gfids are intact. > > NOTE: Directories have to be synced to slave before trying setxattr for > files inside it. > > 1. Directories: > #setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <DIR> > 2. Files: > #setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <file-path> > > One question, can I pass the slave vol a RO ? Because if somebody change a > file in the slave it’s no longer synced (changes and delete but rename > keep > synced between master and slave). > > Will it have an impact on geo-replication process if I pass the slave vol > a > RO ? > > Again if slave volume is modified by something else other than geo-rep, we > might > end up in mismatch of gfids. So exposing the slave volume to consumers as > RO is always > a good idea. It doesn't affect geo-rep as it internally mounts in RW. > > Hope this helps. Let us know if anything else. We are happy to help you. > > Thanks again. > > > -- > Cyril Peponnet > > On May 25, 2015, at 12:43 AM, Kotresh Hiremath Ravishankar > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > wrote: > > Hi Cyril, > > Answers inline > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > To: "Kotresh Hiremath Ravishankar" > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > Cc: "gluster-users" > <gluster-users@xxxxxxxxxxx<mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx>> > Sent: Friday, May 22, 2015 9:34:47 PM > Subject: Re: Geo-Replication - Changelog socket is not > present - Falling back to xsync > > One last question, correct me if I’m wrong. > > When you start a geo-rep process it starts with xsync aka hybrid crawling > (sending files every 60s, with files windows set as 8192 files per sent). > > When the crawl is done it should use changelog detector and dynamically > change things to slaves. > > 1/ During the hybride crawl, if we delete files from master (and they were > already transfered to the slave), xsync process will not delete them from > the slave (and we can’t change as the option as is hardcoded). > When it will pass to changelog, will it remove the non existent folders > and > files on the slave that are no longer on the master ? > > > You are right, xsync does not sync delete files, once it is already > synced. > After xsync, when it switches to changelog, it doesn't delete all the non > existing > entries on slave that are no longer on the master. Changelog is capable of > deleting > files from the time it got switched to changelog. > > 2/ With changelog, if I add a file of 10GB and after a file of 1KB, will > the > changelog process with queue (waiting for the 10GB file to be sent) or are > the sent done in thread ? > (ex I add a 10GB file and I delete it after 1min, what will happen ?) > > Changelog records the operations happened in master and is replayed by > geo-replication > on to slave volume. Geo-replication syncs files in two phases. > > 1. Phase-1: Create entries through RPC( 0 byte files on slave keeping > gfid > intact as in master) > 2. Phase-2: Sync data, through rsync/tar_over_ssh (Multi threaded) > > Ok, now keeping that in mind, Phase-1 happens serially, and the phase two > happens parallely. > Zero byte files of 10GB and 1KB gets created on slave serially and data > for > the same syncs > parallely. Another thing to remember, geo-rep makes sure that, syncing > data > to file is tried > only after zero byte file for the same is created already. > > > In latest release 3.7, xsync crawl is minimized by the feature called > history > crawl introduced in 3.6. > So the chances of missing deletes/renames are less. > > Thanks. > > -- > Cyril Peponnet > > On May 21, 2015, at 10:22 PM, Kotresh Hiremath Ravishankar > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > wrote: > > Great, hope that should work. Let's see > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > To: "Kotresh Hiremath Ravishankar" > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > Cc: "gluster-users" > <gluster-users@xxxxxxxxxxx<mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx>> > Sent: Friday, May 22, 2015 5:31:13 AM > Subject: Re: Geo-Replication - Changelog socket is not > present - Falling back to xsync > > Thanks to JoeJulian / Kaushal I managed to re-enable the changelog option > and > the socket is now present. > > For the record I had some clients running rhs gluster-fuse and our nodes > are > running glusterfs release and op-version are not “compatible”. > > Now I have to wait for the init crawl see if it switches to changelog > detector mode. > > Thanks Kotresh > -- > Cyril Peponnet > > On May 21, 2015, at 8:39 AM, Cyril Peponnet > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > wrote: > > Hi, > > Unfortunately, > > # gluster vol set usr_global changelog.changelog off > volume set: failed: Staging failed on > mvdcgluster01.us.alcatel-lucent.com<http://mvdcgluster01.us.alcatel-lucent.com/><http://mvdcgluster01.us.alcatel-lucent.com<http://mvdcgluster01.us.alcatel-lucent.com/>><http://mvdcgluster01.us.alcatel-lucent.com<http://mvdcgluster01.us.alcatel-lucent.com/>>. > Error: One or more connected clients cannot support the feature being > set. > These clients need to be upgraded or disconnected before running this > command again > > > I don’t know really why, I have some clients using 3.6 as fuse client > others are running on 3.5.2. > > Any advice ? > > -- > Cyril Peponnet > > On May 20, 2015, at 5:17 AM, Kotresh Hiremath Ravishankar > <khiremat@xxxxxxxxxx<mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx><mailto:khiremat@xxxxxxxxxx>> > wrote: > > Hi Cyril, > > From the brick logs, it seems the changelog-notifier thread has got > killed > for some reason, > as notify is failing with EPIPE. > > Try the following. It should probably help: > 1. Stop geo-replication. > 2. Disable changelog: gluster vol set <master-vol-name> > changelog.changelog off > 3. Enable changelog: glluster vol set <master-vol-name> > changelog.changelog on > 4. Start geo-replication. > > Let me know if it works. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > From: "Cyril N PEPONNET (Cyril)" > <cyril.peponnet@xxxxxxxxxxxxxxxxxx<mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx><mailto:cyril.peponnet@xxxxxxxxxxxxxxxxxx>> > To: "gluster-users" > <gluster-users@xxxxxxxxxxx<mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx><mailto:gluster-users@xxxxxxxxxxx>> > Sent: Tuesday, May 19, 2015 3:16:22 AM > Subject: Geo-Replication - Changelog socket is not > present - Falling back to xsync > > Hi Gluster Community, > > I have a 3 nodes setup at location A and a two node setup at location > B. > > All running 3.5.2 under Centos-7. > > I have one volume I sync through georeplication process. > > So far so good, the first step of geo-replication is done > (hybrid-crawl). > > Now I’d like to use the change log detector in order to delete files on > the > slave when they are gone on master. > > But it always fallback to xsync mecanism (even when I force it using > config > changelog_detector changelog): > > [2015-05-18 12:29:49.543922] I [monitor(monitor):129:monitor] Monitor: > ------------------------------------------------------------ > [2015-05-18 12:29:49.544018] I [monitor(monitor):130:monitor] Monitor: > starting gsyncd worker > [2015-05-18 12:29:49.614002] I [gsyncd(/export/raid/vol):532:main_i] > <top>: > syncing: gluster://localhost:vol -> > ssh://root@x.x.x.x:gluster://localhost:vol > [2015-05-18 12:29:54.696532] I > [master(/export/raid/vol):58:gmaster_builder] > <top>: setting up xsync change detection mode > [2015-05-18 12:29:54.696888] I [master(/export/raid/vol):357:__init__] > _GMaster: using 'rsync' as the sync engine > [2015-05-18 12:29:54.697930] I > [master(/export/raid/vol):58:gmaster_builder] > <top>: setting up changelog change detection mode > [2015-05-18 12:29:54.698160] I [master(/export/raid/vol):357:__init__] > _GMaster: using 'rsync' as the sync engine > [2015-05-18 12:29:54.699239] I [master(/export/raid/vol):1104:register] > _GMaster: xsync temp directory: > /var/run/gluster/vol/ssh%3A%2F%2Froot%40x.x.x.x%3Agluster%3A%2F%2F127.0.0.1%3Avol/ce749a38ba30d4171cd674ec00ab24f9/xsync > [2015-05-18 12:30:04.707216] I > [master(/export/raid/vol):682:fallback_xsync] > _GMaster: falling back to xsync mode > [2015-05-18 12:30:04.742422] I > [syncdutils(/export/raid/vol):192:finalize] > <top>: exiting. > [2015-05-18 12:30:05.708123] I [monitor(monitor):157:monitor] Monitor: > worker(/export/raid/vol) died in startup phase > [2015-05-18 12:30:05.708369] I [monitor(monitor):81:set_state] Monitor: > new > state: faulty > [201 > > After some python debugging and stack strace printing I figure out > that: > > /var/run/gluster/vol/ssh%3A%2F%2Froot%40x.x.x.x%3Agluster%3A%2F%2F127.0.0.1%3Avol/ce749a38ba30d4171cd674ec00ab24f9/changes.log > > [2015-05-18 19:41:24.511423] I > [gf-changelog.c:179:gf_changelog_notification_init] 0-glusterfs: > connecting > to changelog socket: > /var/run/gluster/changelog-ce749a38ba30d4171cd674ec00ab24f9.sock > (brick: > /export/raid/vol) > [2015-05-18 19:41:24.511445] W > [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: > connection > attempt 1/5... > [2015-05-18 19:41:26.511556] W > [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: > connection > attempt 2/5... > [2015-05-18 19:41:28.511670] W > [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: > connection > attempt 3/5... > [2015-05-18 19:41:30.511790] W > [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: > connection > attempt 4/5... > [2015-05-18 19:41:32.511890] W > [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: > connection > attempt 5/5... > [2015-05-18 19:41:34.512016] E > [gf-changelog.c:204:gf_changelog_notification_init] 0-glusterfs: could > not > connect to changelog socket! bailing out... > > > /var/run/gluster/changelog-ce749a38ba30d4171cd674ec00ab24f9.sock > doesn’t > exist. So the > https://github.com/gluster/glusterfs/blob/release-3.5/xlators/features/changelog/lib/src/gf-changelog.c#L431 > is failing because > https://github.com/gluster/glusterfs/blob/release-3.5/xlators/features/changelog/lib/src/gf-changelog.c#L153 > cannot open the socket file. > > And I don’t find any error related to changelog in log files, except on > brick > logs node 2 (site A) > > bricks/export-raid-vol.log-20150517:[2015-05-14 17:06:52.636908] E > [changelog-helpers.c:168:changelog_rollover_changelog] 0-vol-changelog: > Failed to send file name to notify thread (reason: Broken pipe) > bricks/export-raid-vol.log-20150517:[2015-05-14 17:06:52.636949] E > [changelog-helpers.c:280:changelog_handle_change] 0-vol-changelog: > Problem > rolling over changelog(s) > > gluster vol status is all fine, and change-log options are enabled in > vol > file > > volume vol-changelog > type features/changelog > option changelog on > option changelog-dir /export/raid/vol/.glusterfs/changelogs > option changelog-brick /export/raid/vol > subvolumes vol-posix > end-volume > > Any help will be appreciated :) > > Oh Btw, hard to stop / restart the volume as I have around 4k clients > connected. > > Thanks ! > > -- > Cyril Peponnet > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users