Hi Andreas. Well, we do _NOT_ need multiside in our environment, but unfortunately is is the basis for the announced "metasearch", based on ElasticSearch... so we try to implement a "multisite" config on Kraken (v11.2.0) since weeks, but never succeeded so far. We have purged and started all over with the multiside config for about ~5x by now. We have one CEPH cluster with two RadosGW's on top (so NOT two CEPH cluster!), not sure if this makes a difference!? Can you please share some infos about your (former working?!?) setup? Like - which CEPH version are you on - old deprecated "federated" or "new from Jewel" multiside setup - one or multiple CEPH clusters Great to see that multisite seems to work somehow somewhere. We were really in doubt :O Thanks & regards Anton P.S.: If someone reads this, who has a working "one Kraken CEPH cluster" based multisite setup (or, let me dream, even a working ElasticSearch setup :| ) please step out of the dark and enlighten us :O Gesendet: Dienstag, 30. Mai 2017 um 11:02 Uhr Von: "Andreas Calminder" <andreas.calminder@xxxxxxxxxx> An: ceph-users@xxxxxxxxxxxxxx Betreff: RGW multisite sync data sync shard stuck Hello, I've got a sync issue with my multisite setup. There's 2 zones in 1 zone group in 1 realm. The data sync in the non-master zone have stuck on Incremental sync is behind by 1 shard, this wasn't noticed until the radosgw instances in the master zone started dying from out of memory issues, all radosgw instances in the non-master zone was then shutdown to ensure services in the master zone while trying to troubleshoot the issue. >From the rgw logs in the master zone I see entries like: 2017-05-29 16:10:34.717988 7fbbc1ffb700 0 ERROR: failed to sync object: 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_1.ext 2017-05-29 16:10:34.718016 7fbbc1ffb700 0 ERROR: failed to sync object: 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_2.ext 2017-05-29 16:10:34.718504 7fbbc1ffb700 0 ERROR: failed to fetch remote data log info: ret=-5 2017-05-29 16:10:34.719443 7fbbc1ffb700 0 ERROR: a sync operation returned error 2017-05-29 16:10:34.720291 7fbc167f4700 0 store->fetch_remote_obj() returned r=-5 sync status in the non-master zone reports that the metadata is up to sync and that the data sync is behind on 1 shard and that the oldest incremental change not applied is about 2 weeks back. I'm not quite sure how to proceed, is there a way to find out the id of the shard and force some kind of re-sync of the data in it from the master zone? I'm unable to have the non-master zone rgw's running because it'll leave the master zone in a bad state with rgw dying every now and then. Regards, Andreas _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com