Hi David,
The 'radosgw-admin sync error list' command may be useful in
debugging sync failures for specific entries. For users, we've
seen some sync failures caused by conflicting user metadata that
was only present on the secondary site. For example, a user that
had the same access key or email address, which we require to be
unique.
Running multiple gateways on the same zone is fully supported,
and unlikely to cause these kinds of issues.
On 08/24/2017 01:51 PM, David Turner
wrote:
After restarting the 2 RGW daemons on the second
site again, everything caught up on the metadata sync. Is there
something about having 2 RGW daemons on each side of the
multisite that might be causing an issue with the sync getting
stale? I have another realm set up the same way that is having
a hard time with its data shards being behind. I haven't told
them to resync, but yesterday I noticed 90 shards were behind.
It's caught back up to only 17 shards behind, but the oldest
change not applied is 2 months old and no order of restarting
RGW daemons is helping to resolve this.
I have a RGW Multisite 10.2.7 set up for
bi-directional syncing. This has been operational for 5
months and working fine. I recently created a new user on
the master zone, used that user to create a bucket, and put
in a public-acl object in there. The Bucket created on the
second site, but the user did not and the object errors out
complaining about the access_key not existing.
That led me to think that the metadata isn't syncing,
while bucket and data both are. I've also confirmed that
data is syncing for other buckets as well in both
directions. The sync status from the second site was this.
metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source
Sync status leads me to think that the second site
believes it is up to date, even though it is missing a
freshly created user. I restarted all of the rgw daemons
for the zonegroup, but it didn't trigger anything to fix
the missing user in the second site. I did some googling
and found the sync init commands mentioned in a few ML
posts and used metadata sync init and now have this as the
sync status.
metadata sync preparing for full sync full sync: 64/64 shards full sync: 0 entries to sync incremental sync: 0/64 shards metadata is behind on 70 shards oldest incremental change not applied: 2017-03-01 21:13:43.0.126971s data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source
It definitely triggered a fresh sync and told it to
forget about what it's previously applied as the date of
the oldest change not applied is the day we initially set
up multisite for this zone. The problem is that was over
12 hours ago and the sync stat hasn't caught up on any
shards yet.
Does anyone have any suggestions other than blast the
second site and set it back up with a fresh start (the
only option I can think of at this point)?
Thank you,
David Turner
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com