Re: radosgw multisite sync - how to fix data behind shards?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wyll,
I wonder if setting a property on all the objects would cause them to sync again with the other zones.
Steve

On Thu, Jun 9, 2022, 11:47 AM Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:

I ended up giving up after trying everything I could find in the forums and docs, deleted the problematic zone, and then re-added it back to the zonegroup and re-established the group sync policy for the bucket in question.  The sync-status is OK now, though the error list still shows a bunch of errors from yesterday that I cannot figure out how to clear ("sync error trim" doesn't do anything that I can tell).

My opinion is that multisite sync policy in the current Pacific release (16.2.9) is still very fragile and poorly documented as far as troubleshooting goes.  I'd love to see clear explanations of the various data and metadata operations - metadata, data, bucket, bilog, datalog.  It's hard to know where to start when things get into a bad state and the online resources are not helpful enough.

Another question, if a sync policy is defined on a bucket already has some objects in it, what command should be used to force a sync operation based on the new policy? It seems that only objects added AFTER the policy is applied get replicated, pre-existing ones are not replicated.  



From: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Sent: Thursday, June 9, 2022 9:35 AM
To: Amit Ghadge <amitg.b14@xxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>; dev@xxxxxxx <dev@xxxxxxx>
Subject: [ceph-users] Re: radosgw multisite sync - how to fix data behind shards?
 
I think you mean "radosgw-admin sync error list", in which case there are 32 shards, each with the same error.  I dont see errors on the master zone logs so I'm not sure how to correct the situation.


        "shard_id": 31,
        "entries": [
            {
                "id": "1_1654722349.230688_62850.1",
                "section": "data",
                "name": "zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6",
                "timestamp": "2022-06-08T21:05:49.230688Z",
                "info": {
                    "source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
                    "error_code": 125,
                    "message": "failed to sync bucket instance: (125) Operation canceled"
                }
            }
        ]
    }




________________________________
From: Amit Ghadge <amitg.b14@xxxxxxxxx>
Sent: Wednesday, June 8, 2022 9:16 PM
To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Subject: Re: radosgw multisite sync - how to fix data behind shards?

check any error by running command radosgw-admin data sync error list


-AmitG


On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx<mailto:wyllys.ingersoll@xxxxxxxxxxxxxx>> wrote:

Seeking help from a radosgw expert...

I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 bucket per zone and a couple of small objects in each bucket for testing purposes.
One of the secondary zones cannot get seem to get into sync with the master, sync status reports:


  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1)
                        syncing
                        full sync: 128/128 shards
                        full sync: 66 buckets to sync
                        incremental sync: 0/128 shards
                        data is behind on 128 shards
                        behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]


I have tried using "data sync init" and restarting the radosgw multiple times, but that does not seem to be helping in any way.

If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just hangs forever and doesn't appear to do anything.  Checking the sync status never shows any improvement in the shards.

It is very hard to figure out what to do as there are a several sync commands -  bucket sync, data sync, metadata sync  - and it is not clear what effect they have or how to properly run them when the syncing gets confused.

Any guidance on how to get out of this situation would be greatly appreciated.  I've read lots of threads on various mailing list archives (via google search) and very few of them have any sort of resolution or recommendation that is confirmed to have fixed these sort of problems.


_______________________________________________
Dev mailing list -- dev@xxxxxxx<mailto:dev@xxxxxxx>
To unsubscribe send an email to dev-leave@xxxxxxx<mailto:dev-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux