Re: RGW multisite sync, data sync issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you for your response,
this made the troubleshooting a bit easier. However we've decided to
wipe the non-master cluster and just re-create it as another zone.
I'm not exactly sure how to do this though, if you guys have time for
some pointers it would be greatly appreciated. My theory is that I can
do something like this:

>From the master zone remove the non-master zone from the zone group,
commit the config and then just wipe the nodes from the non-master
zone and rebuild them and set them up as a new zone in the old zone
group. My worries is that this operation somehow will jeopardize the
data in the master zone and leave me with no data.

On 1 June 2017 at 18:00, Yehuda Sadeh-Weinraub <ysadehwe@xxxxxxxxxx> wrote:
> On Wed, May 31, 2017 at 6:49 AM, Andreas Calminder
> <andreas.calminder@xxxxxxxxxx> wrote:
>> Hello,
>> Asked on ceph-users, thought I post here as well, if anyone knows the
>> ins and outs of rgw.
>> I've got a sync issue with my multisite setup. There's 2 zones in 1
>> zone group in 1 realm. The data sync in the non-master zone have stuck
>> on Incremental sync is behind by 1 shard, this wasn't noticed until
>> the radosgw instances in the master zone started dying from out of
>> memory issues, all radosgw instances in the non-master zone was then
>> shutdown to ensure services in the master zone while trying to
>> troubleshoot the issue.
>>
>> From the rgw logs in the master zone I see entries like:
>>
>> 2017-05-29 16:10:34.717988 7fbbc1ffb700  0 ERROR: failed to sync
>> object: 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_1.ext
>> 2017-05-29 16:10:34.718016 7fbbc1ffb700  0 ERROR: failed to sync
>> object: 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_2.ext
>> 2017-05-29 16:10:34.718504 7fbbc1ffb700  0 ERROR: failed to fetch
>> remote data log info: ret=-5
>> 2017-05-29 16:10:34.719443 7fbbc1ffb700  0 ERROR: a sync operation
>> returned error
>> 2017-05-29 16:10:34.720291 7fbc167f4700  0 store->fetch_remote_obj()
>> returned r=-5
>>
>> sync status in the non-master zone reports that the metadata is up to
>> sync and that the data sync is behind on 1 shard and that the oldest
>> incremental change not applied is about 2 weeks back.
>>
>> I'm not quite sure how to proceed, is there a way to find out the id
>> of the shard and force some kind of re-sync of the data in it from the
>> master zone? I'm unable to have the non-master zone rgw's running
>> because it'll leave the master zone in a bad state with rgw dying
>> every now and then.
>>
>
>
> Maybe start with looking at the sync error log:
>
> $ radosgw-admin sync error list
>
> Then there are radosgw-admin commands that query the different logs
> statuses, and the different sync statuses. E.g.,
>
> $ radosgw-admin bilog status
> $ radosgw-admin datalog status
> $ radosgw-admin mdlog status
>
> and
>
> $ radosgw-admin bucket sync status
> $ radosgw-admin data sync status
> $ radosgw-admin metadata sync status
>
> All commands need extra params that specifies the specific resource
> you're aiming at (e.g., which bucket, which data shard). You probably
> don't need to deal with the metadata sync. The log status commands
> should be run on the source zone, and the sync status on the
> destination.
>
> You can trigger a full resync on the various entities by the following commands:
>
> $ radosgw-admin bucket sync init
> $ radosgw-admin data sync init
> $ radosgw-admin metadata sync init
>
>
> Yehuda



-- 
Andreas Calminder
System Administrator
IT Operations Core Services

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00
Reg no: 556737-0431
klarna.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux