Re: Revisit Large OMAP Objects

<DHilsbos@xxxxxxxxxxxxxx> · Wed, 14 Apr 2021 18:20:05 +0000

Casey;

That makes sense, and I appreciate the explanation.

If I were to shut down all uses of RGW, and wait for replication to catch up, would this then address most known issues with running this command in a multi-site environment?  Can I offline RADOSGW daemons as an added precaution?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx 
www.PerformAir.com

-----Original Message-----
From: Casey Bodley [mailto:cbodley@xxxxxxxxxx] 
Sent: Wednesday, April 14, 2021 9:03 AM
To: Dominic Hilsbos
Cc: k0ste@xxxxxxxx; ceph-users@xxxxxxx
Subject: Re:  Re: Revisit Large OMAP Objects

On Wed, Apr 14, 2021 at 11:44 AM <DHilsbos@xxxxxxxxxxxxxx> wrote:
>
> Konstantin;
>
> Dynamic resharding is disabled in multisite environments.
>
> I believe you mean radosgw-admin reshard stale-instances rm.
>
> Documentation suggests this shouldn't be run in a multisite environment.  Does anyone know the reason for this?

say there's a bucket with 10 objects in it, and that's been fully
replicated to a secondary zone. if you want to remove the bucket, you
delete its objects then delete the bucket

when the bucket is deleted, rgw can't delete its bucket instance yet
because the secondary zone may not be caught up with sync - it
requires access to the bucket instance (and its index) to sync those
last 10 object deletions

so the risk with 'stales-instances rm' in multisite is that you might
delete instances before other zones catch up, which can lead to
orphaned objects

>
> Is it, in fact, safe, even in a multisite environment?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
>
> -----Original Message-----
> From: Konstantin Shalygin [mailto:k0ste@xxxxxxxx]
> Sent: Wednesday, April 14, 2021 12:15 AM
> To: Dominic Hilsbos
> Cc: ceph-users@xxxxxxx
> Subject: Re:  Revisit Large OMAP Objects
>
> Run reshard instances rm
> And reshard your bucket by hand or leave dynamic resharding process to do this work
>
>
> k
>
> Sent from my iPhone
>
> > On 13 Apr 2021, at 19:33, DHilsbos@xxxxxxxxxxxxxx wrote:
> >
> > All;
> >
> > We run 2 Nautilus clusters, with RADOSGW replication (14.2.11 --> 14.2.16).
> >
> > Initially our bucket grew very quickly, as I was loading old data into it and we quickly ran into Large OMAP Object warnings.
> >
> > I have since done a couple manual reshards, which has fixed the warning on the primary cluster.  I have never been able to get rid of the issue on the cluster with the replica.
> >
> > I prior conversation on this list led me to this command:
> > radosgw-admin reshard stale-instances list --yes-i-really-mean-it
> >
> > The results of which look like this:
> > [
> >    "nextcloud-ra:f91aeff8-a365-47b4-a1c8-928cd66134e8.185262.1",
> >    "nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.6",
> >    "nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.2",
> >    "nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.5",
> >    "nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.4",
> >    "nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.3",
> >    "nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.1",
> >    "3520ae821f974340afd018110c1065b8/OS Development:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.1",
> >    "10dfdfadb7374ea1ba37bee1435d87ad/volumebackups:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.2",
> >    "WorkOrder:f91aeff8-a365-47b4-a1c8-928cd66134e8.44130.1"
> > ]
> >
> > I find this particularly interesting, as nextcloud-ra, <swift>/OS Development, <swift>/volumbackups, and WorkOrder buckets no longer exist.
> >
> > When I run:
> > for obj in $(rados -p 300.rgw.buckets.index ls | grep f91aeff8-a365-47b4-a1c8-928cd66134e8.3512190.1);   do   printf "%-60s %7d\n" $obj $(rados -p 300.rgw.buckets.index listomapkeys $obj | wc -l);   done
> >
> > I get the expected 64 entries, with counts around 20000 +/- 1000.
> >
> > Are the above listed stale instances ok to delete?  If so, how do I go about doing so?
> >
> > Thank you,
> >
> > Dominic L. Hilsbos, MBA
> > Director - Information Technology
> > Perform Air International Inc.
> > DHilsbos@xxxxxxxxxxxxxx
> > www.PerformAir.com
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx