Re: OSDs get killed by OOM when other host goes down

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Fri, 12 Nov 2021 14:45:06 +0000

What can you do with osdmap in this case?

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

On 2021. Nov 12., at 13:22, Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote:

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Hi Marius,

We have a 8 hosts cluster with 4TB NVMe drive per host for now. The pool
size is 2 and it's hosting RBD images for VMs.
Each host has 128GB RAM installed.

How many OSDs/host? How many PGs/OSD? Which Ceph version?

What is really happening during recovery / backfills that takes this much
amount of memory for 1 single OSD?

It would be helpful to see what "ceph daemon osd.XXX dump_mempools"
command says for an OSD with high memory. One problem that has been
seen is the pglogs start consuming quite a bit of memory during
recovery scenarios (or even occasionally during steady state). This
issue has been alleviated a bit in Octopus+, where there's a limit on
the number of pglog entries per OSD, but there are still gaps.

Why is the OSD process taking ~100GB RAM and have 25min start time even if
the recovery process ended? (unless we wipe it and register it again).

This sounds like a pileup of osdmaps. Depending on your Ceph version,
all OSDs may need to be up+in in order to trim osdmaps effectively.

Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx