Re: radosgw multisite sync - how to fix data behind shards?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Running "object rewrite" on a couple of the objects in the bucket seems to have triggered the sync and now things appear ok.

________________________________
From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Sent: Thursday, June 9, 2022 3:24 PM
To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>; dev@xxxxxxx <dev@xxxxxxx>
Subject: Re:  Re: radosgw multisite sync - how to fix data behind shards?

Try data sync init and restart the gateways, sometimes this helped me.

If this doesn’t turn on and off the sync policy on the bucket.

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

On 2022. Jun 9., at 20:48, Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

I ended up giving up after trying everything I could find in the forums and docs, deleted the problematic zone, and then re-added it back to the zonegroup and re-established the group sync policy for the bucket in question.  The sync-status is OK now, though the error list still shows a bunch of errors from yesterday that I cannot figure out how to clear ("sync error trim" doesn't do anything that I can tell).

My opinion is that multisite sync policy in the current Pacific release (16.2.9) is still very fragile and poorly documented as far as troubleshooting goes.  I'd love to see clear explanations of the various data and metadata operations - metadata, data, bucket, bilog, datalog.  It's hard to know where to start when things get into a bad state and the online resources are not helpful enough.

Another question, if a sync policy is defined on a bucket already has some objects in it, what command should be used to force a sync operation based on the new policy? It seems that only objects added AFTER the policy is applied get replicated, pre-existing ones are not replicated.


________________________________
From: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Sent: Thursday, June 9, 2022 9:35 AM
To: Amit Ghadge <amitg.b14@xxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>; dev@xxxxxxx <dev@xxxxxxx>
Subject:  Re: radosgw multisite sync - how to fix data behind shards?

I think you mean "radosgw-admin sync error list", in which case there are 32 shards, each with the same error.  I dont see errors on the master zone logs so I'm not sure how to correct the situation.


       "shard_id": 31,
       "entries": [
           {
               "id": "1_1654722349.230688_62850.1",
               "section": "data",
               "name": "zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6",
               "timestamp": "2022-06-08T21:05:49.230688Z",
               "info": {
                   "source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
                   "error_code": 125,
                   "message": "failed to sync bucket instance: (125) Operation canceled"
               }
           }
       ]
   }




________________________________
From: Amit Ghadge <amitg.b14@xxxxxxxxx>
Sent: Wednesday, June 8, 2022 9:16 PM
To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Subject: Re: radosgw multisite sync - how to fix data behind shards?

check any error by running command radosgw-admin data sync error list


-AmitG


On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx<mailto:wyllys.ingersoll@xxxxxxxxxxxxxx>> wrote:

Seeking help from a radosgw expert...

I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 bucket per zone and a couple of small objects in each bucket for testing purposes.
One of the secondary zones cannot get seem to get into sync with the master, sync status reports:


 metadata sync syncing
               full sync: 0/64 shards
               incremental sync: 64/64 shards
               metadata is caught up with master
     data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1)
                       syncing
                       full sync: 128/128 shards
                       full sync: 66 buckets to sync
                       incremental sync: 0/128 shards
                       data is behind on 128 shards
                       behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]


I have tried using "data sync init" and restarting the radosgw multiple times, but that does not seem to be helping in any way.

If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just hangs forever and doesn't appear to do anything.  Checking the sync status never shows any improvement in the shards.

It is very hard to figure out what to do as there are a several sync commands -  bucket sync, data sync, metadata sync  - and it is not clear what effect they have or how to properly run them when the syncing gets confused.

Any guidance on how to get out of this situation would be greatly appreciated.  I've read lots of threads on various mailing list archives (via google search) and very few of them have any sort of resolution or recommendation that is confirmed to have fixed these sort of problems.


_______________________________________________
Dev mailing list -- dev@xxxxxxx<mailto:dev@xxxxxxx>
To unsubscribe send an email to dev-leave@xxxxxxx<mailto:dev-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux