Re: RGW Lifecycle Processing and Promote Master Process

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 19 Aug 2020 10:43:02 -0400

On Fri, Aug 14, 2020 at 9:25 AM Alex Hussein-Kershaw
<Alex.Hussein-Kershaw@xxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I've previously discussed some issues I've had with the RGW lifecycle processing. I've discovered that the root cause of my problem is that:
>
>   *   I'm running a multisite configuration
>      *   Life cycle processing is done on the master site each night. `radosgw-admin lc list` correctly returns all buckets with lc config.
>   *   I simulate the master site being destroyed from my VM host.
>   *   I promote the secondary site to master following the instructions here:  https://docs.ceph.com/docs/master/radosgw/multisite/
>      *   The new master site isn't doing any lifecycle processing. `radosgw-admin lc list` returns empty.
>   *   I recreate a cluster and pair it with the new master site to get back to having multisite redundancy.
>      *   Neither site is doing any lifecycle processing. `radosgw-admin lc list` returns empty.
> So in the process of failover/recovery I have gone from having two paired clusters performing lifecycle processing, to two paired clusters NOT performing lifecycle processing.
>
> Is this behaviour expected? I've found `radosgw-admin lc reshard fix` will "remind" the cluster that I run it on that it needs to do lifecycle processing. Although I found no mention of having to use this in the docs, for that command the docs state it's only relevant on earlier Ceph versions. I'm running Nautilus 14.2.9.
>
> In addition, if I have two healthy clusters paired in a multisite system, and swap the master cluster by promoting the non-master, the demoted cluster seems to still continue doing lifecycle processing, while the promote does not. If I run `radosgw-admin lc reshard fix` on the promoted cluster, then both clusters seem to claim they are doing the processing. Is this a happy state to be in?
>
> Does anyone have any experience with this?
>
> Thanks,
> Alex
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

There's a defect in metadata sync
(https://tracker.ceph.com/issues/44268) which prevents buckets with
lifecycle policies from being indexed for lifecycle processing on
non-master zones. It sounds like the 'lc reshard fix' command is
adding it back to that index for processing.

The intent is for lifecycle processing to occur independently on every
zone. That's the only way to guarantee the correct result now that we
have PutBucketReplication (and specifically the Filter policy) where
any given zone may only hold a subset of the objects from its source.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx