Hi David,
The 'data sync init' command won't touch any actual object data,
no. Resetting the data sync status will just cause a zone to
restart a full sync of the --source-zone's data changes log. This
log only lists which buckets/shards have changes in them, which
causes radosgw to consider them for bucket sync. So while the
command may silence the warnings about data shards being behind,
it's unlikely to resolve the issue with missing objects in those
buckets.
When data sync is behind for an extended period of time, it's
usually because it's stuck retrying previous bucket sync failures.
The 'sync error list' may help narrow down where those failures
are.
There is also a 'bucket sync init' command to clear the bucket
sync status. Following that with a 'bucket sync run' should
restart a full sync on the bucket, pulling in any new objects that
are present on the source-zone. I'm afraid that those commands
haven't seen a lot of polish or testing, however.
Casey
On 08/24/2017 04:15 PM, David Turner
wrote:
Apparently the data shards that are behind go in
both directions, but only one zone is aware of the problem.
Each cluster has objects in their data pool that the other
doesn't have. I'm thinking about initiating a `data sync init`
on both sides (one at a time) to get them back on the same
page. Does anyone know if that command will overwrite any local
data that the zone has that the other doesn't if you run `data
sync init` on it?
After restarting the 2 RGW daemons on the
second site again, everything caught up on the metadata
sync. Is there something about having 2 RGW daemons on each
side of the multisite that might be causing an issue with
the sync getting stale? I have another realm set up the
same way that is having a hard time with its data shards
being behind. I haven't told them to resync, but yesterday
I noticed 90 shards were behind. It's caught back up to
only 17 shards behind, but the oldest change not applied is
2 months old and no order of restarting RGW daemons is
helping to resolve this.
I have a RGW Multisite 10.2.7 set up for
bi-directional syncing. This has been operational for 5
months and working fine. I recently created a new user
on the master zone, used that user to create a bucket,
and put in a public-acl object in there. The Bucket
created on the second site, but the user did not and the
object errors out complaining about the access_key not
existing.
That led me to think that the metadata isn't
syncing, while bucket and data both are. I've also
confirmed that data is syncing for other buckets as
well in both directions. The sync status from the
second site was this.
metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source
Sync status leads me to think that the second site
believes it is up to date, even though it is missing a
freshly created user. I restarted all of the rgw
daemons for the zonegroup, but it didn't trigger
anything to fix the missing user in the second site.
I did some googling and found the sync init commands
mentioned in a few ML posts and used metadata sync
init and now have this as the sync status.
metadata sync preparing for full sync full sync: 64/64 shards full sync: 0 entries to sync incremental sync: 0/64 shards metadata is behind on 70 shards oldest incremental change not applied: 2017-03-01 21:13:43.0.126971s data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source
It definitely triggered a fresh sync and told it to
forget about what it's previously applied as the date
of the oldest change not applied is the day we
initially set up multisite for this zone. The problem
is that was over 12 hours ago and the sync stat hasn't
caught up on any shards yet.
Does anyone have any suggestions other than blast
the second site and set it back up with a fresh start
(the only option I can think of at this point)?
Thank you,
David Turner
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com