Re: Massive OMAP remediation

Dan van der Ster <dan.vanderster@xxxxxxxxx> · Wed, 26 Apr 2023 09:10:15 -0700

Hi Ben,

Are you compacting the relevant osds periodically? ceph tell osd.x
compact (for the three osds holding the bilog) would help reshape the
rocksdb levels to least perform better for a little while until the
next round of bilog trims.

Otherwise, I have experience deleting ~50M object indices in one step
in the past, probably back in the luminous days IIRC. It will likely
lockup the relevant osds for a while while the omap is removed. If you
dare take that step, it might help to set nodown; that might prevent
other osds from flapping and creating more work.

Cheers, Dan

______________________________
Clyso GmbH | https://www.clyso.com

On Tue, Apr 25, 2023 at 2:45 PM Ben.Zieglmeier
<Ben.Zieglmeier@xxxxxxxxxx> wrote:
>
> Hi All,
>
> We have a RGW cluster running Luminous (12.2.11) that has one object with an extremely large OMAP database in the index pool. Listomapkeys on the object returned 390 Million keys to start. Through bilog trim commands, we’ve whittled that down to about 360 Million. This is a bucket index for a regrettably unsharded bucket. There are only about 37K objects actually in the bucket, but through years of neglect, the bilog grown completely out of control. We’ve hit some major problems trying to deal with this particular OMAP object. We just crashed 4 OSDs when a bilog trim caused enough churn to knock one of the OSDs housing this PG out of the cluster temporarily. The OSD disks are 6.4TB NVMe, but are split into 4 partitions, each housing their own OSD daemon (collocated journal).
>
> We want to be rid of this large OMAP object, but are running out of options to deal with it. Reshard outright does not seem like a viable option, as we believe the deletion would deadlock OSDs can could cause impact. Continuing to run `bilog trim` 1000 records at a time has been what we’ve done, but this also seems to be creating impacts to performance/stability. We are seeking options to remove this problematic object without creating additional problems. It is quite likely this bucket is abandoned, so we could remove the data, but I fear even the deletion of such a large OMAP could bring OSDs down and cause potential for metadata loss (the other bucket indexes on that same PG).
>
> Any insight available would be highly appreciated.
>
> Thanks.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx