Re: Cluster crash after 2B objects pool removed

Dan van der Ster <dvanders@xxxxxxxxx> · Thu, 24 Feb 2022 09:14:41 +0100

Hi,

Basically, a deletion of any size shouldn't cause osds to crash. So
please open a tracker with some example osd logs showing the crash
backtraces.

Cheers, Dan

On Thu, Feb 24, 2022 at 6:20 AM Szabo, Istvan (Agoda)
<Istvan.Szabo@xxxxxxxxx> wrote:
>
> Hi,
>
> I've removed the old RGW data pool with 2B objects because in multisite seems like the user remove with data purge doesn't work, so need to cleanup somehow.
> Luckily I've expected that the cluster will crash so no user in it, but I wonder how this can be done on a smooth way.
>
> I've deleted the pool 23 Feb 2-3pm and the I'd say 90-95% osd down happened around 7:30am on 24.
> I've manually compacted all the osds so it got's back to normal, but I'd be curious what operations happens in the morning 7:30am in ceph?
>
> The only way that I guess this can be prevented, if this cleanup operation happens 7:30 am, I should do this kind of delete thing like 8am, compact all the osd before the next day 7:30am, so might be not crash?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
> ---------------------------------------------------
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx