Re: Adding a third zone with tier type archive

Yosh de Vos <yosh@xxxxxxxxxx> · Mon, 2 Aug 2021 10:57:53 +0200

I have deleted the archive zone and recreated it with sync_from_all=false
and sync_from=fo-am so only that zone is used to sync from.
The full sync finally completed and didn't fill up the cluster entirely but
is still using more space (28 TB instead of 13 TB).

I have noticed that a lot of objects have up to 4 versions which probably
explains why more space is used.
Are the duplicates objects related to the fact we have 4 RGW daemons
running in the archive zone?
Also a lot of objects are missing in the archive zone (unclear yet how
many).
I already wrote a script to clean up all non-latest versions.

The entire archive sync module feels pretty unstable with a lot of enclear
errors:

RGW-SYNC:data:sync:shard[37]: a sync operation returned error
RGW-SYNC:data:sync:shard[58]: read error repo: got 0 entries
data sync: SYNC_ARCHIVE: sync_object: error versioning archive bucket
failed to read remote metadata entry: (5) Input/output error

Is anyone else using the archive sync module and know where
these problems are coming from?

Kind regards,

Yosh de Vos

Op di 27 jul. 2021 om 18:04 schreef Yosh de Vos <yosh@xxxxxxxxxx>:

> Hi all,
>
> I am trying to add a third zone to our zonegroup with the tier type
> archive (the archive sync module). However this caused the archive Ceph
> cluster to fill up entirely and made radosgw-admin unusable.
> I had to delete the zone pools and delete the zone to get the cluster up
> and running again without adding more disks.
>
> There are a lot of errors on the RGW server logs like:
>
> RGW-SYNC:data:sync:shard[37]: a sync operation returned error
> data sync: SYNC_ARCHIVE: sync_object: error versioning archive bucket
>
> And the following error messages from *radosgw-admin sync error list*:
>
> failed to sync bucket instance: (125) Operation canceled
> failed to read remote metadata entry: (5) Input/output error
>
> I have now updated the Ceph cluster from 14.2.16 to 14.2.22 and recreated
> the zone.
> The sync is currently running again but still with the same errors.
> The output of *radosgw-admin sync status* is:
>
> radosgw-admin sync status
>           realm 363d82c9-72df-4c82-b8fa-4196e3307fe5 (company)
>       zonegroup f691f801-992f-49d0-8a7d-39d1b73eeea8 (fo)
>            zone cbe6a428-591e-4a94-96ee-dcb18fc30eeb (fo-of)
>   metadata sync syncing
>                 full sync: 0/64 shards
>                 incremental sync: 64/64 shards
>                 metadata is caught up with master
>       data sync source: dffdf44a-6d0d-4ded-bd29-f716d31b1a2d (fo-am)
>                         syncing
>                         full sync: 109/128 shards
>                         full sync: 7809 buckets to sync
>                         incremental sync: 19/128 shards
>                         data is behind on 120 shards
>                         behind shards:
> [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,45,46,47,48,49,50,52,53,54,55,56,57,58,60,61,62,63,64,65,66,67,68,69,70,71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,123,125,126,127]
>                         oldest incremental change not applied: 2021-07-26
> 11:08:25.0.984823s [115]
>                         14 shards are recovering
>                         recovering shards:
> [44,51,58,59,60,74,75,76,90,91,92,106,122,124]
>                 source: f4b0e442-1b70-4bdd-b019-9827f181a6af (fo-ch)
>                         syncing
>                         full sync: 103/128 shards
>                         full sync: 7080 buckets to sync
>                         incremental sync: 25/128 shards
>                         data is behind on 114 shards
>                         behind shards:
> [0,1,2,4,5,6,7,8,9,11,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,77,78,79,80,81,82,84,85,86,87,88,89,91,93,94,95,96,97,98,100,101,102,103,104,105,106,109,110,111,112,113,114,115,116,117,118,119,120,121,122,124,125,126,127]
>                         oldest incremental change not applied: 2021-07-26
> 11:08:11.0.648844s [115]
>                         12 shards are recovering
>                         recovering shards:
> [10,11,12,35,51,67,83,92,99,106,107,108]
>
>
> And that makes me wonder if the archive tries to sync the same data from
> both zones?
> Don't I need to configure sync_from_all=false the sync_from to one of the
> other zones?
> I can't find anything about this in the documentation or by searching
> online.
>
> Kind regards,
>
> Yosh de Vos
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx