Re: Geo-Replication - Changelog socket is not present - Falling back to xsync

Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> · Wed, 20 May 2015 08:17:31 -0400 (EDT)

Hi Cyril,

>From the brick logs, it seems the changelog-notifier thread has got killed for some reason,
as notify is failing with EPIPE. 

Try the following. It should probably help:
1. Stop geo-replication.
2. Disable changelog: gluster vol set <master-vol-name> changelog.changelog off
3. Enable changelog: glluster vol set <master-vol-name> changelog.changelog on
4. Start geo-replication.

Let me know if it works.

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "Cyril N PEPONNET (Cyril)" <cyril.peponnet@xxxxxxxxxxxxxxxxxx>
> To: "gluster-users" <gluster-users@xxxxxxxxxxx>
> Sent: Tuesday, May 19, 2015 3:16:22 AM
> Subject:  Geo-Replication - Changelog socket is not present - Falling back to xsync
> 
> Hi Gluster Community,
> 
> I have a 3 nodes setup at location A and a two node setup at location B.
> 
> All running 3.5.2 under Centos-7.
> 
> I have one volume I sync through georeplication process.
> 
> So far so good, the first step of geo-replication is done (hybrid-crawl).
> 
> Now I’d like to use the change log detector in order to delete files on the
> slave when they are gone on master.
> 
> But it always fallback to xsync mecanism (even when I force it using config
> changelog_detector changelog):
> 
> [2015-05-18 12:29:49.543922] I [monitor(monitor):129:monitor] Monitor:
> ------------------------------------------------------------
> [2015-05-18 12:29:49.544018] I [monitor(monitor):130:monitor] Monitor:
> starting gsyncd worker
> [2015-05-18 12:29:49.614002] I [gsyncd(/export/raid/vol):532:main_i] <top>:
> syncing: gluster://localhost:vol ->
> ssh://root@x.x.x.x:gluster://localhost:vol
> [2015-05-18 12:29:54.696532] I [master(/export/raid/vol):58:gmaster_builder]
> <top>: setting up xsync change detection mode
> [2015-05-18 12:29:54.696888] I [master(/export/raid/vol):357:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2015-05-18 12:29:54.697930] I [master(/export/raid/vol):58:gmaster_builder]
> <top>: setting up changelog change detection mode
> [2015-05-18 12:29:54.698160] I [master(/export/raid/vol):357:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2015-05-18 12:29:54.699239] I [master(/export/raid/vol):1104:register]
> _GMaster: xsync temp directory:
> /var/run/gluster/vol/ssh%3A%2F%2Froot%40x.x.x.x%3Agluster%3A%2F%2F127.0.0.1%3Avol/ce749a38ba30d4171cd674ec00ab24f9/xsync
> [2015-05-18 12:30:04.707216] I [master(/export/raid/vol):682:fallback_xsync]
> _GMaster: falling back to xsync mode
> [2015-05-18 12:30:04.742422] I [syncdutils(/export/raid/vol):192:finalize]
> <top>: exiting.
> [2015-05-18 12:30:05.708123] I [monitor(monitor):157:monitor] Monitor:
> worker(/export/raid/vol) died in startup phase
> [2015-05-18 12:30:05.708369] I [monitor(monitor):81:set_state] Monitor: new
> state: faulty
> [201
> 
> After some python debugging and stack strace printing I figure out that:
> 
> /var/run/gluster/vol/ssh%3A%2F%2Froot%40x.x.x.x%3Agluster%3A%2F%2F127.0.0.1%3Avol/ce749a38ba30d4171cd674ec00ab24f9/changes.log
> 
> [2015-05-18 19:41:24.511423] I
> [gf-changelog.c:179:gf_changelog_notification_init] 0-glusterfs: connecting
> to changelog socket:
> /var/run/gluster/changelog-ce749a38ba30d4171cd674ec00ab24f9.sock (brick:
> /export/raid/vol)
> [2015-05-18 19:41:24.511445] W
> [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection
> attempt 1/5...
> [2015-05-18 19:41:26.511556] W
> [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection
> attempt 2/5...
> [2015-05-18 19:41:28.511670] W
> [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection
> attempt 3/5...
> [2015-05-18 19:41:30.511790] W
> [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection
> attempt 4/5...
> [2015-05-18 19:41:32.511890] W
> [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection
> attempt 5/5...
> [2015-05-18 19:41:34.512016] E
> [gf-changelog.c:204:gf_changelog_notification_init] 0-glusterfs: could not
> connect to changelog socket! bailing out...
> 
> 
> /var/run/gluster/changelog-ce749a38ba30d4171cd674ec00ab24f9.sock doesn’t
> exist. So the
> https://github.com/gluster/glusterfs/blob/release-3.5/xlators/features/changelog/lib/src/gf-changelog.c#L431
> is failing because
> https://github.com/gluster/glusterfs/blob/release-3.5/xlators/features/changelog/lib/src/gf-changelog.c#L153
> cannot open the socket file.
> 
> And I don’t find any error related to changelog in log files, except on brick
> logs node 2 (site A)
> 
> bricks/export-raid-vol.log-20150517:[2015-05-14 17:06:52.636908] E
> [changelog-helpers.c:168:changelog_rollover_changelog] 0-vol-changelog:
> Failed to send file name to notify thread (reason: Broken pipe)
> bricks/export-raid-vol.log-20150517:[2015-05-14 17:06:52.636949] E
> [changelog-helpers.c:280:changelog_handle_change] 0-vol-changelog: Problem
> rolling over changelog(s)
> 
> gluster vol status is all fine, and change-log options are enabled in vol
> file
> 
> volume vol-changelog
> type features/changelog
> option changelog on
> option changelog-dir /export/raid/vol/.glusterfs/changelogs
> option changelog-brick /export/raid/vol
> subvolumes vol-posix
> end-volume
> 
> Any help will be appreciated :)
> 
> Oh Btw, hard to stop / restart the volume as I have around 4k clients
> connected.
> 
> Thanks !
> 
> --
> Cyril Peponnet
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users