Re: OSDs get killed by OOM when other host goes down

Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> · Fri, 12 Nov 2021 05:22:12 -0700

Hi Marius,

> We have a 8 hosts cluster with 4TB NVMe drive per host for now. The pool
> size is 2 and it's hosting RBD images for VMs.
> Each host has 128GB RAM installed.

How many OSDs/host? How many PGs/OSD? Which Ceph version?

> What is really happening during recovery / backfills that takes this much
> amount of memory for 1 single OSD?

It would be helpful to see what "ceph daemon osd.XXX dump_mempools"
command says for an OSD with high memory. One problem that has been
seen is the pglogs start consuming quite a bit of memory during
recovery scenarios (or even occasionally during steady state). This
issue has been alleviated a bit in Octopus+, where there's a limit on
the number of pglog entries per OSD, but there are still gaps.

> Why is the OSD process taking ~100GB RAM and have 25min start time even if
> the recovery process ended? (unless we wipe it and register it again).

This sounds like a pileup of osdmaps. Depending on your Ceph version,
all OSDs may need to be up+in in order to trim osdmaps effectively.

Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx